Skip to content

Commit 339bbd4

Browse files
ddoron9dykim234vagenas
authored
perf: cache grid property in HTMLTableSerializer (#373)
* refactor: avoid nested grid access in HTMLTableSerializer Refactor row/col iteration to bypass Pydantic __getitem__ calls, reducing overhead from nested loops and improving table serialization speed. Signed-off-by: doyikim <doyikim34@naver.com> * fix autoflake findings Signed-off-by: Panos Vagenas <pva@zurich.ibm.com> --------- Signed-off-by: doyikim <doyikim34@naver.com> Signed-off-by: Panos Vagenas <pva@zurich.ibm.com> Co-authored-by: dykim <dykim34@crowdworks.kr> Co-authored-by: Panos Vagenas <pva@zurich.ibm.com>
1 parent b2095b3 commit 339bbd4

File tree

1 file changed

+2
-7
lines changed
  • docling_core/transforms/serializer

1 file changed

+2
-7
lines changed

docling_core/transforms/serializer/html.py

Lines changed: 2 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,6 @@
6767
PictureTabularChartData,
6868
RichTableCell,
6969
SectionHeaderItem,
70-
TableCell,
7170
TableItem,
7271
TextItem,
7372
TitleItem,
@@ -347,9 +346,6 @@ def serialize(
347346
**kwargs: Any,
348347
) -> SerializationResult:
349348
"""Serializes the passed table item to HTML."""
350-
nrows = item.data.num_rows
351-
ncols = item.data.num_cols
352-
353349
res_parts: list[SerializationResult] = []
354350
cap_res = doc_serializer.serialize_captions(item=item, tag="caption", **kwargs)
355351
if cap_res.text:
@@ -359,10 +355,9 @@ def serialize(
359355
body = ""
360356
span_source: Union[DocItem, list[SerializationResult]] = []
361357

362-
for i in range(nrows):
358+
for i, row in enumerate(item.data.grid):
363359
body += "<tr>"
364-
for j in range(ncols):
365-
cell: TableCell = item.data.grid[i][j]
360+
for j, cell in enumerate(row):
366361

367362
rowspan, rowstart = (
368363
cell.row_span,

0 commit comments

Comments
 (0)