Recognize Meaningful Words and Idioms from the Images Based on OCR Tesseract Engine and NLTK

0Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

OCR means optical character recognition, which is a text extraction technology that works with photos, scanned data, and PDF documents. By extracting text data, OCR systems typically convert non-editable, non-searchable documents into editable, searchable files. As a result, information finding and identification from digitized files is simplified. R bindings are provided by the Tesseract package. Tesseract is a strong optical character recognition (OCR) engine with over 100 languages supported. The engine is highly customizable, allowing you to fine-tune the detection algorithms to achieve the best possible results. With the help of Tesseract OCR technology, a method for extracting texts from photos was created. Any image can be used as input for the proposed OCR system, which converts it into a searchable text document. Furthermore, this system can search for words within the generated text and display the Bengali meaning terms. It finds the words and lines first, then identifies the words, then the static character classifier classifies the character, then does analysis, and finally an adaptive classifier. It is a framework which also includes a natural language processing approach for classifying commonly used terms with Bangla meanings from the output text, in addition to OCR.

Cite

CITATION STYLE

APA

Chakraborty, P., Rakib Mia, M., Sumon, H. K., Sarker, A., Imtiaz, A., Mahbubur Rahman, M., … Choudhury, T. (2022). Recognize Meaningful Words and Idioms from the Images Based on OCR Tesseract Engine and NLTK. In Lecture Notes in Electrical Engineering (Vol. 888, pp. 297–310). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-19-1520-8_23

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free