Automated Structured Data Extraction from Scanned Document Images

Shivani Nigam

Book Chapter

Automated Structured Data Extraction from Scanned Document Images

Nigam S

Springer Science and Business Media Deutschland GmbH, (2023), 47-60

DOI: 10.1007/978-981-19-2600-6_4

0Citations

1Readers

Get full text

Abstract

Digital technologies are now becoming part of all the sectors be it banking, automobile, infrastructure, and more. These technologies are empowered by “Data”. This is raising the need for the digitization of documents to fulfill the need for data for driving the digital transformation throughout sectors. Digitization requires the extraction of a huge amount of data from paper-based documents. Automating data extraction from paper-based documents can help in dealing with large volumes of data at a lower cost with lesser efforts. A solution is proposed which uses open-source components to automate the process of data extraction from scanned documents with minimal user input. The solution is capable of generating the structured output reflecting the document layout with the data in a document. The solution is capable of extracting data from tables and stamps present in documents in a well-structured format. The solution is driven by a configuration file, which can help in fine-tuning different processes to improve extracted data. The solution generates an XML for the scanned document which can be used further for storing and processing the data present in paper-based documents by different digital processes.

Author supplied keywords

Cite

CITATION STYLE

APA

Nigam, S. (2023). Automated Structured Data Extraction from Scanned Document Images. In Lecture Notes on Data Engineering and Communications Technologies (Vol. 137, pp. 47–60). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-19-2600-6_4

Automated Structured Data Extraction from Scanned Document Images

Abstract

Author supplied keywords

Cite

Register to see more suggestions