-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Open
Labels
is-robustness-issueFrom a users perspective, this is about robustnessFrom a users perspective, this is about robustness
Description
When parsing creation_date
, I get IndexError: string index out of range
Environment
$ python -m platform
Linux-4.18.0-425.19.2.el8_7.x86_64-x86_64-with-glibc2.28
$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==6.0.0, crypt_provider=('cryptography', '45.0.6'), PIL=10.4.0
I initially used pypdf==5.5.0
, and both 5.5.0
and 6.0.0
have the same outcome.
Code + PDF
This is a minimal, complete example that shows the issue:
reader = PdfReader(stream=BytesIO(pdf_bytes))
metadata = reader.metadata
creation_date = str(metadata.creation_date) if metadata.creation_date else None # IndexError here!
This is the file I used: guwunmong.pdf
Traceback
Below is the traceback (I masked my file name as it is irrelevant to the issue).
File "/***masked***.py", line 123, in _parse_standard_metadata
creation_date = str(metadata.creation_date) if metadata.creation_date else None
File "/***masked***/lib/python3.10/site-packages/pypdf/_doc_common.py", line 216, in creation_date
return parse_iso8824_date(self._get_text(DI.CREATION_DATE))
File "/***masked***/lib/python3.10/site-packages/pypdf/_utils.py", line 85, in parse_iso8824_date
if text[0].isdigit():
IndexError: string index out of range
I suspect below code causes the issue:
https://github.com/py-pdf/pypdf/blob/6.0.0/pypdf/_utils.py#L82
def parse_iso8824_date(text: Optional[str]) -> Optional[datetime]:
orgtext = text
if text is None:
return None
if text[0].isdigit(): # <<< text = '' causes IndexError
I ran a debugger and found that when text
is ''
(empty string), text[0]
causes IndexError
.
On Line 80, it should be if not text:
rather than if text is None:
to handle ''
case.
Thanks in advance.
Metadata
Metadata
Assignees
Labels
is-robustness-issueFrom a users perspective, this is about robustnessFrom a users perspective, this is about robustness