Automatic selection of table areas in documents for information extraction

Ana Costa E Silva; Alípio Jorge; Luís Torgo

Journal Article

Automatic selection of table areas in documents for information extraction

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2003) 2902 460-465

DOI: 10.1007/978-3-540-24580-3_54

7Citations

18Readers

Get full text

Abstract

The information contained in companies' financial statements is valuable to several users. Much of the relevant information in such documents is contained in tables and is currently mainly extracted by hand. We propose a method that accomplishes a prior step of the task of automatically extracting information from tables in documents: selecting the lines that are likely to belong to tables. Our method has been developed by empirically analyzing a set of Portuguese companies' financial statements using statistical and data mining techniques. Empirical evaluation indicates that more than 99% of table lines are selected after discarding at least 50% of all lines. The method can cope with the complexity of styles used in assembling information on paper and adapt its performance accordingly, thus maximizing its results. © Springer-Verlag Berlin Heidelberg 2003.

Cite

CITATION STYLE

APA

Costa E Silva, A., Jorge, A., & Torgo, L. (2003). Automatic selection of table areas in documents for information extraction. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2902, 460–465. https://doi.org/10.1007/978-3-540-24580-3_54

Automatic selection of table areas in documents for information extraction

Abstract

Cite

Register to see more suggestions