Performance bottleneck in HTML serializer due to repeated grid cell access

Hi,

I’ve run into a performance issue when serializing large tables in `docling_core/transforms/serializer/html.py`.

In HTMLTableSerializer.serialize, the code loops over the grid like this:
```
for i in range(nrows):
    body += "<tr>"
    for j in range(ncols):
        cell: TableCell = item.data.grid[i][j]
        ...
```

For big tables (thousands of rows × many columns), this nested indexing becomes the main bottleneck — profiling shows it dominates the conversion time.

Suggestion:
Cache the grid access (row-level or full-grid) to avoid repeated lookups. For example:

```
for row in item.data.grid:
    body += "<tr>"
    for cell in row:
        ...
```

Alternatively, pre-extract the needed TableCell fields into a lighter structure before looping.
Would you be open to a PR with this refactor? It should speed up large table serialization a lot.

Thanks for your work! 🚀

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance bottleneck in HTML serializer due to repeated grid cell access #372

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Performance bottleneck in HTML serializer due to repeated grid cell access #372

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions