Skip to content

IndexError when parsing creation_date #3446

@KyleJung0828

Description

@KyleJung0828

When parsing creation_date, I get IndexError: string index out of range

Environment

$ python -m platform
Linux-4.18.0-425.19.2.el8_7.x86_64-x86_64-with-glibc2.28

$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==6.0.0, crypt_provider=('cryptography', '45.0.6'), PIL=10.4.0

I initially used pypdf==5.5.0, and both 5.5.0 and 6.0.0 have the same outcome.

Code + PDF

This is a minimal, complete example that shows the issue:

reader = PdfReader(stream=BytesIO(pdf_bytes))
metadata = reader.metadata
creation_date = str(metadata.creation_date) if metadata.creation_date else None  # IndexError here!

This is the file I used: guwunmong.pdf

Traceback

Below is the traceback (I masked my file name as it is irrelevant to the issue).

  File "/***masked***.py", line 123, in _parse_standard_metadata
    creation_date = str(metadata.creation_date) if metadata.creation_date else None
  File "/***masked***/lib/python3.10/site-packages/pypdf/_doc_common.py", line 216, in creation_date
    return parse_iso8824_date(self._get_text(DI.CREATION_DATE))
  File "/***masked***/lib/python3.10/site-packages/pypdf/_utils.py", line 85, in parse_iso8824_date
    if text[0].isdigit():
IndexError: string index out of range

I suspect below code causes the issue:

https://github.com/py-pdf/pypdf/blob/6.0.0/pypdf/_utils.py#L82

def parse_iso8824_date(text: Optional[str]) -> Optional[datetime]:
    orgtext = text
    if text is None:
        return None
    if text[0].isdigit():  # <<< text = '' causes IndexError

I ran a debugger and found that when text is '' (empty string), text[0] causes IndexError.
On Line 80, it should be if not text: rather than if text is None: to handle '' case.
Thanks in advance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    is-robustness-issueFrom a users perspective, this is about robustness

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions