Text retrieval from scanned forms using optical character recognition

Vaishali Aggarwal; Sourabh Jajoria; Apoorvi Sood

Conference Proceedings

Text retrieval from scanned forms using optical character recognition

Advances in Intelligent Systems and Computing (2018) 651 207-216

DOI: 10.1007/978-981-10-6614-6_21

3Citations

15Readers

Get full text

Abstract

This paper investigates the use of image processing techniques and machine learning algorithm of logistic regression to extract text from scanned forms. Conversion of printed or handwritten documents into digital modifiable text is a tedious task and requires a lot of human effort. In order to automate this task, we apply the machine learning algorithm of logistic regression. The main components of this system are (i) text detection from the scanned document and (ii) character recognition of the individual characters in the detected text. In order to complete these tasks, we firstly use the image processing techniques to do line segmentation, character segmentation, and then ultimately character recognition. The character recognition is done by a one-vs-all classifier which is trained using the training data set and learns the parameters with the help of this data set. Once the classifier has learned the parameters, it could identify a total of 39 characters which include capital English alphabets, numerals, and a few symbols.

Author supplied keywords

Cite

CITATION STYLE

APA

Aggarwal, V., Jajoria, S., & Sood, A. (2018). Text retrieval from scanned forms using optical character recognition. In Advances in Intelligent Systems and Computing (Vol. 651, pp. 207–216). Springer Verlag. https://doi.org/10.1007/978-981-10-6614-6_21

Text retrieval from scanned forms using optical character recognition

Abstract

Author supplied keywords

Cite

Register to see more suggestions