Skip to content

Conversation

ddoron9
Copy link
Contributor

@ddoron9 ddoron9 commented Aug 26, 2025

Problem

Nested grid[i][j] access in HTMLTableSerializer.serialize caused performance issues on large tables.

Change

Replaced double indexing with row-level iteration (for row in grid: for cell in row:). Fixes #372

Impact

Removes repeated Pydantic lookups and improves serialization speed without changing output.

Copy link
Contributor

github-actions bot commented Aug 26, 2025

DCO Check Passed

Thanks @ddoron9, all your commits are properly signed off. 🎉

Copy link

mergify bot commented Aug 26, 2025

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

@ddoron9 ddoron9 marked this pull request as draft August 26, 2025 07:14
Refactor row/col iteration to bypass Pydantic __getitem__ calls,
reducing overhead from nested loops and improving table serialization speed.

Signed-off-by: doyikim <doyikim34@naver.com>
@ddoron9 ddoron9 force-pushed the html-table-serializer-perf branch from 726a9ec to 58bb55b Compare August 26, 2025 07:16
@ddoron9 ddoron9 marked this pull request as ready for review August 26, 2025 07:18
Copy link

dosubot bot commented Aug 26, 2025

Related Documentation
0 document(s) may need updating based on files changed in this PR

How did I do? Any feedback?  Join Discord

@vagenas vagenas self-assigned this Aug 28, 2025
@vagenas vagenas self-requested a review August 28, 2025 15:06
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
@vagenas vagenas changed the title refactor: avoid nested grid access in HTMLTableSerializer perf: cache grid property in HTMLTableSerializer Aug 28, 2025
Copy link

codecov bot commented Aug 28, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@vagenas vagenas merged commit 339bbd4 into docling-project:main Aug 28, 2025
11 checks passed
Copy link

dosubot bot commented Aug 28, 2025

Documentation updates
Checked 2 published document(s). No updates required.

How did I do? Any feedback?  Join Discord

@vagenas
Copy link
Collaborator

vagenas commented Aug 28, 2025

Thanks for the contribution @ddoron9!
I just fixed some minor autoflake finding — make sure to install the pre-commit hooks so they automatically run for your next contribution 😉
Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Performance bottleneck in HTML serializer due to repeated grid cell access
3 participants