Table representation learning using heterogeneous graph embedding

Willy Carlos Tchuitcheu; Tan Lu; Ann Dooms

Journal ArticleOPEN ACCESS

Table representation learning using heterogeneous graph embedding

Pattern Recognition (2024) 156

DOI: 10.1016/j.patcog.2024.110734

5Citations

12Readers

Abstract

Tables, especially when having complex layouts, contain rich semantic information. However, effectively learning from tables to uncover such semantic information remains challenging. The rapid progress in natural language processing does not necessarily correspond to equivalent advancements in table parsing, which often requires joint visual and language modeling. Indeed, humans can quickly derive semantic meaning from table entries by associating them with corresponding column and/or row headers. Motivated by this observation, we propose a new heterogeneous Graph-based Table Representation Learning (GTRL) framework. GTRL combines graph-based visual modeling with sequence-based language modeling to learn granular per-cell embeddings that are sensitive to the semantic meaning of cells within their corresponding table context. We systematically evaluate the proposed GTRL framework using two datasets: a new adhesive table benchmark comprising complex tables extracted from industrial documents for learning per-entry semantics, and a publicly available large-scale dataset that enables learning header semantics from column tables. Experimental results demonstrate the competitive performance of the proposed GTRL, which often exhibits reduced computational complexity compared to state-of-the-art table representation learning models.

Author supplied keywords

Cite

CITATION STYLE

APA

Tchuitcheu, W. C., Lu, T., & Dooms, A. (2024). Table representation learning using heterogeneous graph embedding. Pattern Recognition, 156. https://doi.org/10.1016/j.patcog.2024.110734

Table representation learning using heterogeneous graph embedding

Abstract

Author supplied keywords

Cite

Register to see more suggestions