Document image understanding through iterative transductive learning

Michelangelo Ceci; Corrado Loglisci; Lucrezia Macchia; Donato Malerba; Luciano Quercia

Conference Proceedings

Document image understanding through iterative transductive learning

Communications in Computer and Information Science (2013) 354 CCIS 117-128

DOI: 10.1007/978-3-642-35834-0_13

0Citations

3Readers

Get full text

Abstract

In Document Image Understanding, one of the fundamental tasks is that of recognizing semantically relevant components in the layout extracted from a document image. This process can be automatized by learning classifiers able to automatically label such components. However, the learning process assumes the availability of a huge set of documents whose layout components have been previously manually labeled. Indeed, this contrasts with the more common situation in which we have only few labeled documents and abundance of unlabeled ones. In addition, labeling layout documents introduces further complexity aspects due to multi-modal nature of the components (textual and spatial information may coexist). In this work, we investigate the application of a relational classifier that works in the transductive setting. The relational setting is justified by the multi-modal nature of the data we are dealing with, while transduction is justified by the possibility of exploiting the large amount of information conveyed in the unlabeled layout components. The classifier bootstraps the labeling process in an iterative way: reliable classifications are used in subsequent iterative steps as training examples. The proposed computational solution has been evaluated on document images of scientific literature. © 2013 Springer-Verlag.

Cite

CITATION STYLE

APA

Ceci, M., Loglisci, C., Macchia, L., Malerba, D., & Quercia, L. (2013). Document image understanding through iterative transductive learning. In Communications in Computer and Information Science (Vol. 354 CCIS, pp. 117–128). Springer Verlag. https://doi.org/10.1007/978-3-642-35834-0_13

Document image understanding through iterative transductive learning

Abstract

Cite

Register to see more suggestions