Layout analysis and content classification in digitized books

8Citations
Citations of this article
12Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Automatic layout analysis has proven to be extremely important in the process of digitization of large amounts of documents. In this paper we present amixed approach to layout analysis, introducing a SVMaided layout segmentation process and a classification process based on local and geometrical features. The final output of the automatic analysis algorithm is a complete and structured annotation in JSON format, containing the digitalized text aswell as all the references to the illustrations of the input page, and which can be used by visualization interfaces as well as annotation interfaces. We evaluate our algorithm on a large dataset built upon the first volume of the “Enciclopedia Treccani”.

Cite

CITATION STYLE

APA

Corbelli, A., Baraldi, L., Balducci, F., Grana, C., & Cucchiara, R. (2017). Layout analysis and content classification in digitized books. In Communications in Computer and Information Science (Vol. 701, pp. 153–165). Springer Verlag. https://doi.org/10.1007/978-3-319-56300-8_14

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free