A comparison of two unsupervised table recognition methods from digital scientific articles

14Citations
Citations of this article
35Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In digital scientific articles tables are a common form of presenting information in a structured way. However, the large variability of table layouts and the lack of structural information in digital document formats pose significant challenges for information retrieval and related tasks. In this paper we present two table recognition methods based on unsupervised learning techniques and heuristics which automatically detect both the location and the structure of tables within a article stored as PDF. For both algorithms the table region detection first identifies the bounding boxes of individual tables from a set of labelled text blocks. In the second step, two different tabular structure detection methods extract a rectangular grid of table cells from the set of words contained in these table regions. We evaluate each stage of the algorithms separately and compare performance values on two data sets from different domains. We find that the table recognition performance is in line with state-of-the-art commercial systems and generalises to the non-scientific domain.

Cite

CITATION STYLE

APA

Klampfl, S., Jack, K., & Kern, R. (2014). A comparison of two unsupervised table recognition methods from digital scientific articles. D-Lib Magazine, 20(11–12). https://doi.org/10.1045/november14-klampfl

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free