Extraction of referential heading-entries in recognized table of contents pages

0Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This paper presents our research focusing on extracting referential heading-entries in recognized table of contents (TOC) pages. This task encounters two issues: the complexity of layouts (e.g., a referential heading-entry can have one or many lines, with “decorate” texts, etc.), and some text data errors caused by OCR processing in training data. Our approach uses several layout-based and content-based features to classify textual lines of TOC pages in datasets. Also, we propose synthesis rules to combine related and classified lines into identify referential heading-entries. The experiments are conducted on ICDAR Book Structure Extraction Datasets 2009, 2011, and 2013. The results of experiments show that proposed approach is more efficient than previous methods of referential heading-entries extraction.

Cite

CITATION STYLE

APA

Nguyen, P. T., & Nguyen, D. T. (2015). Extraction of referential heading-entries in recognized table of contents pages. In Advances in Intelligent Systems and Computing (Vol. 348, pp. 1–9). Springer Verlag. https://doi.org/10.1007/978-3-319-18503-3_1

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free