Text retrieval from scanned forms using optical character recognition

3Citations
Citations of this article
15Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This paper investigates the use of image processing techniques and machine learning algorithm of logistic regression to extract text from scanned forms. Conversion of printed or handwritten documents into digital modifiable text is a tedious task and requires a lot of human effort. In order to automate this task, we apply the machine learning algorithm of logistic regression. The main components of this system are (i) text detection from the scanned document and (ii) character recognition of the individual characters in the detected text. In order to complete these tasks, we firstly use the image processing techniques to do line segmentation, character segmentation, and then ultimately character recognition. The character recognition is done by a one-vs-all classifier which is trained using the training data set and learns the parameters with the help of this data set. Once the classifier has learned the parameters, it could identify a total of 39 characters which include capital English alphabets, numerals, and a few symbols.

Cite

CITATION STYLE

APA

Aggarwal, V., Jajoria, S., & Sood, A. (2018). Text retrieval from scanned forms using optical character recognition. In Advances in Intelligent Systems and Computing (Vol. 651, pp. 207–216). Springer Verlag. https://doi.org/10.1007/978-981-10-6614-6_21

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free