Extraction of words directly from handwritten document images is still a challenging problem in the development of a complete Optical Character Recognition (OCR) system. In this paper, a robust word extraction scheme is reported. Firstly, applying Harris corner point detection algorithm, key points are generated from the document images which are then clustered using well-known DBSCAN technique. Finally, the boundary of the text words present in the document images are estimated based on the convex hull drawn for each of the clustered key points. The proposed technique is tested on randomly selected 50 images from CMATERdb1database and the success rate is found to be 90.48% which is equivalent to the state-of-the-art.
CITATION STYLE
Singh, P. K., Chowdhury, S. P., Sinha, S., Eum, S., & Sarkar, R. (2017). Page-to-word extraction from unconstrained handwritten document images. In Advances in Intelligent Systems and Computing (Vol. 458, pp. 517–525). Springer Verlag. https://doi.org/10.1007/978-981-10-2035-3_53
Mendeley helps you to discover research relevant for your work.