One of the steps of character recognition systems is the segmentation of words/sub-words into characters. The segmentation of text written in any Arabic script is a most difficult task. Due to this difficulty, many systems consider sub-words instead of a character as the basic unit for recognition. We propose a method for the segmentation of printed Arabic words/sub-words into characters. In the proposed method, primary and secondary strokes of the sub-words are separated and then segmentation points are identified in the primary strokes. For this, we compute the vertical projection graph for each line, which is then processed to generate a string indicating relative variations in pixels. The string is scanned further to produce characters from the sub-words. In the proposed method we use Sindhi text for segmentation into characters as its character set is the super set of Arabic. This method can be used for any other Naskh-based Arabic script such as Persian, Pashto and Urdu. © 2008 Springer-Verlag.
CITATION STYLE
Shaikh, N. A., Shaikh, Z. A., & Ali, G. (2008). Segmentation of Arabic text into characters for recognition. In Communications in Computer and Information Science (Vol. 20 CCIS, pp. 11–18). Springer Verlag. https://doi.org/10.1007/978-3-540-89853-5_4
Mendeley helps you to discover research relevant for your work.