A comparison of approaches for automated text extraction from scholarly figures

Falk Böschen; Ansgar Scherp

Conference Proceedings

A comparison of approaches for automated text extraction from scholarly figures

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2017) 10132 LNCS 15-27

DOI: 10.1007/978-3-319-51811-4_2

4Citations

5Readers

Get full text

Abstract

So far, there has not been a comparative evaluation of different approaches for text extraction from scholarly figures. In order to fill this gap, we have defined a generic pipeline for text extraction that abstracts from the existing approaches as documented in the literature. In this paper, we use this generic pipeline to systematically evaluate and compare 32 configurations for text extraction over four datasets of scholarly figures of different origin and characteristics. In total, our experiments have been run over more than 400 manually labeled figures. The experimental results show that the approach BS-4OS results in the best F-measure of 0.67 for the Text Location Detection and the best average Levenshtein Distance of 4.71 between the recognized text and the gold standard on all four datasets using the Ocropy OCR engine.

Author supplied keywords

Cite

CITATION STYLE

APA

Böschen, F., & Scherp, A. (2017). A comparison of approaches for automated text extraction from scholarly figures. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10132 LNCS, pp. 15–27). Springer Verlag. https://doi.org/10.1007/978-3-319-51811-4_2

A comparison of approaches for automated text extraction from scholarly figures

Abstract

Author supplied keywords

Cite

Register to see more suggestions