Efficient segmentation of characters in printed Bengali texts

Ayan Chaudhury; Ujjwal Bhattacharya

Conference Proceedings

Efficient segmentation of characters in printed Bengali texts

Communications in Computer and Information Science (2012) 305 CCIS 389-397

DOI: 10.1007/978-3-642-32112-2_45

1Citations

3Readers

Get full text

Abstract

This paper describes our study of a new and robust approach for character segmentation of printed Bengali text. Like several other Indian scripts, the character set of Bengali consists of basic, modified and conjunct characters. A text line of Bengali has three prominent horizontal zones. Most of its characters appear only in the middle zone while character modifiers or their parts may appear in the upper and lower zones vertically above or below another character. Thus, only vertical segmentation of Bengali texts produces a combinatorially large number of possible shapes making the classification stage intractably difficult. Usually, the problem is tackled by a two-way approach which considers both vertical and horizontal segmentation. Existing approaches for horizontal segmentation of lower zone often fail frequently on old printed Bengali documents due to their typical type-settings. In fact, there is no distinct lower zone in several such Bengali documents. The proposed approach of segmenting modified Bengali characters of the lower zone does not require explicit identification of this lower zone and it is based on the use of a set of empirically designed rules for thinned images of Bengali texts. © 2012 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

Chaudhury, A., & Bhattacharya, U. (2012). Efficient segmentation of characters in printed Bengali texts. In Communications in Computer and Information Science (Vol. 305 CCIS, pp. 389–397). https://doi.org/10.1007/978-3-642-32112-2_45

Efficient segmentation of characters in printed Bengali texts

Abstract

Author supplied keywords

Cite

Register to see more suggestions