State estimation in a document image and its application in text block identification and text line extraction

Hyung Il Koo; Nam Ik Cho

Conference ProceedingsOPEN ACCESS

State estimation in a document image and its application in text block identification and text line extraction

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2010) 6312 LNCS(PART 2) 421-434

DOI: 10.1007/978-3-642-15552-9_31

26Citations

46Readers

Abstract

This paper proposes a new approach to the estimation of document states such as interline spacing and text line orientation, which facilitates a number of tasks in document image processing. The proposed method can be applied to spatially varying states as well as invariant ones, so that general cases including images of complex layout, camera-captured images, and handwritten ones can also be handled. Specifically, we find CCs (Connected Components) in a document image and assign a state to each of them. Then the states of CCs are estimated using an energy minimization framework, where the cost function is designed based on frequency domain analysis and minimized via graph-cuts. Using the estimated states, we also develop a new algorithm that performs text block identification and text line extraction. Roughly speaking, we can segment an image into text blocks by cutting the distant connections among the CCs (compared to the estimated interline spacing), and we can group the CCs into text lines using a bottom-up grouping along the estimated text line orientation. Experimental results on a variety of document images show that our method is efficient and provides promising results in several document image processing tasks. © 2010 Springer-Verlag.

Author supplied keywords

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Koo, H. I., & Cho, N. I. (2010). State estimation in a document image and its application in text block identification and text line extraction. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6312 LNCS, pp. 421–434). Springer Verlag. https://doi.org/10.1007/978-3-642-15552-9_31

Readers' Seniority

PhD / Post grad / Masters / Doc 28

70%

Researcher 8

20%

Professor / Associate Prof. 4

10%

Readers' Discipline

Computer Science 26

72%

Engineering 8

22%

Earth and Planetary Sciences 1

Physics and Astronomy 1

State estimation in a document image and its application in text block identification and text line extraction

Abstract

Author supplied keywords

References Powered by Scopus

Fast approximate energy minimization via graph cuts

Feature Detection with Automatic Scale Selection

The Document Spectrum for Page Layout Analysis

Cited by Powered by Scopus

Computer vision for assistive technologies

A comprehensive survey of mostly textual document segmentation algorithms since 2008

Language-independent text-line extraction algorithm for handwritten documents

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline