A solution for segmentation of Bangla word images, printed in different fonts with varying styles and sizes, into constituent characters is reported here. Firstly, three horizontally non-intersecting zones viz., Upper, Middle and Lower Zones of a given word are identified. Then, estimation of the probable black pixels, which constitute common Matra of the word, a prominent feature in Bangla script, is done. Some of the black pixels on the Matra region are selected as potential segmentation points to segment the word vertically into their constituent characters. Each of these segmented components is then categorized into any of the six possible component types (viz. upper/middle/lower zone component/ middle and lower zone component/ broken character component/noise component). Middle and lower zone components are separated horizontally. The methodology is tested on 1600 word images of different fonts with varying styles and sizes and average success rate achieved is 96.85%. © 2012 Springer-Verlag GmbH Berlin Heidelberg.
CITATION STYLE
Sarkar, R., Malakar, S., Das, N., Basu, S., Kundu, M., & Nasipuri, M. (2012). A font invariant character segmentation technique for printed bangla word images. In Advances in Intelligent and Soft Computing (Vol. 132 AISC, pp. 739–746). Springer Verlag. https://doi.org/10.1007/978-3-642-27443-5_84
Mendeley helps you to discover research relevant for your work.