Document layout analysis for semantic information extraction

Weronika T. Adrian; Nicola Leone; Marco Manna; Cinzia Marte

Conference Proceedings

Document layout analysis for semantic information extraction

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2017) 10640 LNAI 269-281

DOI: 10.1007/978-3-319-70169-1_20

7Citations

13Readers

Get full text

Abstract

Using machines to automatically extract relevant information from unstructured and semi-structured sources has practical significance in todays life and business. In this context, although understanding the meaning of words is important, the process of identifying self-consistent geometric and logical regions of interest—blocks, cells, columns and tables, as well as paragraphs, titles and captions, only to mention a few—is of paramount importance too. This complex process goes under the name of document layout analysis. In this work, we discuss newly designed techniques to solve this problem effectively, by combining both syntactic and semantic document aspects. These techniques described here are at the basis of KnowRex, a comprehensive system for ontology-driven Information Extraction.

Author supplied keywords

Cite

CITATION STYLE

APA

Adrian, W. T., Leone, N., Manna, M., & Marte, C. (2017). Document layout analysis for semantic information extraction. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10640 LNAI, pp. 269–281). Springer Verlag. https://doi.org/10.1007/978-3-319-70169-1_20

Document layout analysis for semantic information extraction

Abstract

Author supplied keywords

Cite

Register to see more suggestions