This chapter describes a complete system for the recognition of un-constrained handwritten Arabic words using over-segmentation of characters and a variable duration hidden Markov model (VDHMM). First, a segmentation algorithm based on morphology and linguistic information is used to translate the 2D image into a 1D sequence of subcharacter symbols. This sequence of symbols is modeled by one single contextual VDHMM. Generally, there are two information sources associated with the written text: shape information and linguistic information. Forty-five features are selected to represent the shape information of character and subcharacter symbols in the feature space. The shape information of each character symbol, i.e., a feature vector, is modeled as an independently distributed multi-variate discrete distribution or a joint continuous distribution. Linguistic knowledge about character transition is modeled as a Markov chain, where each character in the alphabet is a state and bigram probabilities are the state transition probabilities. In this context, the variable duration state is used to take care of the segmentation ambiguity among the consecutive characters. We outline the substantial effort that has been expended to create a corpus of handwritten Arabic words and characters extracted from these handwritten words. Using this corpus and the IFN dataset 2003, detailed experimental results are described to demonstrate the success of the proposed scheme.
CITATION STYLE
Kundu, A., & Hines, T. (2012). Arabic Handwriting Recognition Using VDHMM and Over-segmentation. In Guide to OCR for Arabic Scripts (pp. 507–540). Springer London. https://doi.org/10.1007/978-1-4471-4072-6_21
Mendeley helps you to discover research relevant for your work.