An experimental technique for ocr line and word segmentation using probability distribution estimation

Citations of this article
Mendeley users who have this article in their library.
Get full text


Segmentation is always an important step in designing an Optical Character Recognition (OCR) of any script. In this paper, we focus on the line and word segmentation in typewritten Gurmukhi script documents. In order to perform this task, we consider OCR based methodology where several processing steps are implemented. The typewritten documents suffer from several issues such as noise, skew, and quality of the document. In this work, we present a combined pre-processing scheme where document thresholding and skew detection and correction schemes are implemented where image thresholding is obtained using Niblack’s method and skew correction is carried out using gradient histogram algorithm and uniform orientation is obtained. Later, line segmentation scheme is applied where probability density function is applied to generate the text distribution in the probability map. Here, identifying the relation of the text to the exact line is a challenging task hence, we present a 2D-Gaussian modelling which helps to identify the text boundaries in the x and y direction. The proposed methodology is applied for typewritten Gurmukhi documents and an experimental study is carried out to show that the proposed approach achieves better performance when compared with the existing techniques.




Goyal, R., Narula, R. K., & Kumar Jindal, M. (2019). An experimental technique for ocr line and word segmentation using probability distribution estimation. International Journal of Recent Technology and Engineering, 8(2 Special issue 3), 1484–1494.

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free