Automated Structured Data Extraction from Scanned Document Images

0Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Digital technologies are now becoming part of all the sectors be it banking, automobile, infrastructure, and more. These technologies are empowered by “Data”. This is raising the need for the digitization of documents to fulfill the need for data for driving the digital transformation throughout sectors. Digitization requires the extraction of a huge amount of data from paper-based documents. Automating data extraction from paper-based documents can help in dealing with large volumes of data at a lower cost with lesser efforts. A solution is proposed which uses open-source components to automate the process of data extraction from scanned documents with minimal user input. The solution is capable of generating the structured output reflecting the document layout with the data in a document. The solution is capable of extracting data from tables and stamps present in documents in a well-structured format. The solution is driven by a configuration file, which can help in fine-tuning different processes to improve extracted data. The solution generates an XML for the scanned document which can be used further for storing and processing the data present in paper-based documents by different digital processes.

Cite

CITATION STYLE

APA

Nigam, S. (2023). Automated Structured Data Extraction from Scanned Document Images. In Lecture Notes on Data Engineering and Communications Technologies (Vol. 137, pp. 47–60). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-19-2600-6_4

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free