This paper describes our study of a new and robust approach for character segmentation of printed Bengali text. Like several other Indian scripts, the character set of Bengali consists of basic, modified and conjunct characters. A text line of Bengali has three prominent horizontal zones. Most of its characters appear only in the middle zone while character modifiers or their parts may appear in the upper and lower zones vertically above or below another character. Thus, only vertical segmentation of Bengali texts produces a combinatorially large number of possible shapes making the classification stage intractably difficult. Usually, the problem is tackled by a two-way approach which considers both vertical and horizontal segmentation. Existing approaches for horizontal segmentation of lower zone often fail frequently on old printed Bengali documents due to their typical type-settings. In fact, there is no distinct lower zone in several such Bengali documents. The proposed approach of segmenting modified Bengali characters of the lower zone does not require explicit identification of this lower zone and it is based on the use of a set of empirically designed rules for thinned images of Bengali texts. © 2012 Springer-Verlag.
CITATION STYLE
Chaudhury, A., & Bhattacharya, U. (2012). Efficient segmentation of characters in printed Bengali texts. In Communications in Computer and Information Science (Vol. 305 CCIS, pp. 389–397). https://doi.org/10.1007/978-3-642-32112-2_45
Mendeley helps you to discover research relevant for your work.