Text detection in document images by machine learning algorithms

4Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In the proposed paper,we consider a problem of text detection in document images. This problem plays an important role in OCR systems and is a challenging task. In the first step of our proposed text detection approach, we use a self-adjusting bottom-up segmentation algorithm to segment a document image into a set of connected components (CCs). The segmentation algorithm is based on the Sobel edge detection method. In the second step, CCs are described in terms of 27 features and a machine learning algorithm is then used to classify the CCs as text or nontext. For testing the approach, we have collected a dataset (ASTRoID), which contains 500 images of text blocks and 500 images of nontext blocks. We empirically compare performance of the proposed text detection method when using seven different machine learning algorithms.

Cite

CITATION STYLE

APA

Zelenika, D., Povh, J., & Ženko, B. (2016). Text detection in document images by machine learning algorithms. In Advances in Intelligent Systems and Computing (Vol. 403, pp. 169–179). Springer Verlag. https://doi.org/10.1007/978-3-319-26227-7_16

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free