Hierarchical content classification and script determination for automatic document image processing

7Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Page segmentation and image content classification plays an important role in automatic document image processing with applications to mixed-type document image compression, form and check reading, and automatic mail sorting. In this paper, we propose an enhanced background-thinning based page segmentation algorithm to process document images rapidly and eliminate some small regions embedded in other regions. We then present a hierarchical approach, which combines cross correlation measure, Kolmogorov complexity measure, and a neural network, to classify sub-images into halftones and texts. The approach also achieves high accuracy in text determination using a three-layer feed-forward network, where text region can be classified into Chinese or alphabetic character. Experimental results on a number of mixed-type document images show the efficiency and effectiveness of our approach. © 2002 IEEE.

Cite

CITATION STYLE

APA

Wang, Q., Chi, Z., & Zhao, R. (2002). Hierarchical content classification and script determination for automatic document image processing. In Proceedings - International Conference on Pattern Recognition (Vol. 16, pp. 77–80). https://doi.org/10.1109/icpr.2002.1047799

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free