Recognition of hindi and bengali handwritten and typed text from images using tesseract on android platform

Shubhendu Banerjee; Sumit Kumar Singh; Atanu Das; Rajib Bag

Journal Article

Recognition of hindi and bengali handwritten and typed text from images using tesseract on android platform

International Journal of Innovative Technology and Exploring Engineering (2019) 9(1) 3507-3516

DOI: 10.35940/ijitee.A5252.119119

1Citations

8Readers

Get full text

Abstract

The concept of digitization has marked a revolution in the area of data conversion, data storage and data sharing by converting non-editable typographic & handwritten text into editable electronic text. Though numerous such works have been carried out across the world in various languages using Optical Character Recognition (OCR), satisfactory output has been observed only in a few languages. This paper is an endeavor towards taking a step ahead in the digitization of two of the most extensively spoken languages in the Indian sub-continent – Hindi and Bengali-using Google’s open source OCR Engine, Tesseract. Working on the scripts of these two languages of Brahmi origin has its own challenges owing to their varied traits of character segmentation and word formation. Here, the training of Tesseract with data sets of Hindi and Bengali typographic and handwritten characters has been integrated with an inimitable pre-processing stage involving input image customization and image augmentation that significantly enhances the image quality allowing Tesseract to offer more accurate results, especially in cases of handwritten texts and obscure images. Besides, it also incorporates the features of English translation and text to speech translation which render their significance among the non-natives and visually impaired mass. The focal idea of this paper has been to reach out to an extended mass by enabling digitization on the Android platform. Comparative analysis carried out on three distinctive parameters-on images with typographic texts, handwritten texts and on inferior quality images-shows that the paper, to a certain extent, does succeed in projecting superior output in at least two cases as compared to the most consistent Android application of today’s time.

Author supplied keywords

Cite

CITATION STYLE

APA

Banerjee, S., Singh, S. K., Das, A., & Bag, R. (2019). Recognition of hindi and bengali handwritten and typed text from images using tesseract on android platform. International Journal of Innovative Technology and Exploring Engineering, 9(1), 3507–3516. https://doi.org/10.35940/ijitee.A5252.119119

Recognition of hindi and bengali handwritten and typed text from images using tesseract on android platform

Abstract

Author supplied keywords

Cite

Register to see more suggestions