A new Connected Component Analysis based System for Text Segmentation in Degraded Historical Document Images

  • et al.
N/ACitations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Historical documents contain valuable heritage information. These documents are preserved in the manuscript preservation center and archaeological departments. They are mostly degraded in nature and hence hard to read and understand the contents. So, there is a need for text segmentation and feature extraction to convert these manuscripts into machine editable format. In this work, we present an effective way to segment historical document images into characters. It is a challenging segmentation process due to complex background images. In this paper, horizontal histogram, vertical histogram and connected component analysis is used to segment text documents images. In this algorithm, the input image is converted to gray scale image, then gray image is converted into binary image [Otsu’s method] and then all the objects containing fewer than desired pixels are removed. Line and word segmentation is implemented using horizontal and vertical histogram method respectively. Then the connected components are labeled and properties are measured for the image regions. Connected component analysis is used to segment the characters and the individual characters are extracted. The simulation result shows that the proposed segmentation method achieves an average accuracy of 93.37% for HDLAC 2011 DATASET. Moreover this method is more efficient and more suitable for real time tasks.

Cite

CITATION STYLE

APA

Narayanan*, Mr. V. S., Kasthuri, Dr. N., … Deepa, D. (2020). A new Connected Component Analysis based System for Text Segmentation in Degraded Historical Document Images. International Journal of Innovative Technology and Exploring Engineering, 9(6), 69–75. https://doi.org/10.35940/ijitee.f3503.049620

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free