Recognition of hindi and bengali handwritten and typed text from images using tesseract on android platform

1Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The concept of digitization has marked a revolution in the area of data conversion, data storage and data sharing by converting non-editable typographic & handwritten text into editable electronic text. Though numerous such works have been carried out across the world in various languages using Optical Character Recognition (OCR), satisfactory output has been observed only in a few languages. This paper is an endeavor towards taking a step ahead in the digitization of two of the most extensively spoken languages in the Indian sub-continent – Hindi and Bengali-using Google’s open source OCR Engine, Tesseract. Working on the scripts of these two languages of Brahmi origin has its own challenges owing to their varied traits of character segmentation and word formation. Here, the training of Tesseract with data sets of Hindi and Bengali typographic and handwritten characters has been integrated with an inimitable pre-processing stage involving input image customization and image augmentation that significantly enhances the image quality allowing Tesseract to offer more accurate results, especially in cases of handwritten texts and obscure images. Besides, it also incorporates the features of English translation and text to speech translation which render their significance among the non-natives and visually impaired mass. The focal idea of this paper has been to reach out to an extended mass by enabling digitization on the Android platform. Comparative analysis carried out on three distinctive parameters-on images with typographic texts, handwritten texts and on inferior quality images-shows that the paper, to a certain extent, does succeed in projecting superior output in at least two cases as compared to the most consistent Android application of today’s time.

Cite

CITATION STYLE

APA

Banerjee, S., Singh, S. K., Das, A., & Bag, R. (2019). Recognition of hindi and bengali handwritten and typed text from images using tesseract on android platform. International Journal of Innovative Technology and Exploring Engineering, 9(1), 3507–3516. https://doi.org/10.35940/ijitee.A5252.119119

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free