Efficient segmentation of characters in printed Bengali texts

1Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This paper describes our study of a new and robust approach for character segmentation of printed Bengali text. Like several other Indian scripts, the character set of Bengali consists of basic, modified and conjunct characters. A text line of Bengali has three prominent horizontal zones. Most of its characters appear only in the middle zone while character modifiers or their parts may appear in the upper and lower zones vertically above or below another character. Thus, only vertical segmentation of Bengali texts produces a combinatorially large number of possible shapes making the classification stage intractably difficult. Usually, the problem is tackled by a two-way approach which considers both vertical and horizontal segmentation. Existing approaches for horizontal segmentation of lower zone often fail frequently on old printed Bengali documents due to their typical type-settings. In fact, there is no distinct lower zone in several such Bengali documents. The proposed approach of segmenting modified Bengali characters of the lower zone does not require explicit identification of this lower zone and it is based on the use of a set of empirically designed rules for thinned images of Bengali texts. © 2012 Springer-Verlag.

Cite

CITATION STYLE

APA

Chaudhury, A., & Bhattacharya, U. (2012). Efficient segmentation of characters in printed Bengali texts. In Communications in Computer and Information Science (Vol. 305 CCIS, pp. 389–397). https://doi.org/10.1007/978-3-642-32112-2_45

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free