Unsupervised Text Binarization in Handwritten Historical Documents Using k-Means Clustering

Huseyin Kusetogullari

Book Chapter

Unsupervised Text Binarization in Handwritten Historical Documents Using k-Means Clustering

Kusetogullari H

Springer, (2018), 23-32

DOI: 10.1007/978-3-319-56991-8_3

0Citations

5Readers

Get full text

Abstract

In this paper, we propose a novel technique for unsupervised text binarization in handwritten historical documents using k-means clustering. In the text binarization problem, there are many challenges such as noise, faint characters and bleed-through and it is necessary to overcome these tasks to increase the correct detection rate. To overcome these problems, preprocessing strategy is first used to enhance the contrast to improve faint characters and Gaussian Mixture Model (GMM) is used to ignore the noise and other artifacts in the handwritten historical documents. After that, the enhanced image is normalized which will be used in the postprocessing part of the proposed method. The handwritten binarization image is achieved by partitioning the normalized pixel values of the handwritten image into two clusters using k-means clustering with k = 2 and then assigning each normalized pixel to the one of the two clusters by using the minimum Euclidean distance between the normalized pixels intensity and mean normalized pixel value of the clusters. Experimental results verify the effectiveness of the proposed approach.

Author supplied keywords

Cite

CITATION STYLE

APA

Kusetogullari, H. (2018). Unsupervised Text Binarization in Handwritten Historical Documents Using k-Means Clustering. In Lecture Notes in Networks and Systems (Vol. 16, pp. 23–32). Springer. https://doi.org/10.1007/978-3-319-56991-8_3

Unsupervised Text Binarization in Handwritten Historical Documents Using k-Means Clustering

Abstract

Author supplied keywords

Cite

Register to see more suggestions