AUTOMATING DIGITIZED DOCUMENT PROCESSING WITH HANDWRITTEN DIGITS IN THE PUBLIC SECTOR USING CONVOLUTIONAL NEURAL NETWORKS

1Citations
Citations of this article
19Readers
Mendeley users who have this article in their library.

Abstract

Aim Automation of information extraction from digitized complex documents that contain both printed and handwritten text (as well as non-textual information) is one of the actual problems in the digital transformation of public administration. This study proposes an ML-based approach to improve the quality of automated extraction and processing of numerical data from digitized documents with handwritten digits using optical character recognition technology. Background Currently, in public institutions in Ukraine, manual intervention is a bottleneck in the process of extracting numerical data from digitized documents and subsequent processing. New approaches to automating these processes are needed. Methodology The methodology includes preprocessing of the document image, segmentation and classification of handwritten digits, conversion of the extracted digits into a date format with the possibility of their validation, and performing necessary calculations based on the extracted numerical information. First, handwritten digits from scanned images of document pages are segmented, then they are preprocessed and sent for recognition with a module based on convolutional neural networks. The image preprocessing steps consisted of binarization, application of a Gaussian filter to remove noise, and use of the Hough transform to correct the document’s skew angle. A CNN model was used to perform character-by-character classification for the recognition of segmented digits. Contribution The study addresses current limitations in the extraction of handwritten digits from complex document images. The segmentation technique utilizes morphological transformations such as erosion and dilation, as well as the connected components method. The study explored different architectures with convolutional neural networks to determine the optimal hyperparameter configuration. The research findings confirm the importance of integrating methods to ensure effective image preprocessing, segmentation, and recognition of extracted digits. Findings Experimental results demonstrate that the proposed approach decreases the processing time by a factor of 7.7 and increases the accuracy of numerical data recognition on pages containing fragments of handwritten digits. This indicates that operational tasks are completed with higher accuracy and efficiency. Recommendations Software may be developed based on this research and implemented in the Pen-for Practitioners sion Fund of Ukraine to automate the processing of digitized documents to determine service lengths and calculate the amount of pension to be awarded. The solution makes it possible to eliminate manual intervention in the process of data extraction and processing. Recommendations The study showed high accuracy in recognizing individual handwritten digits – for Researchers 99.68% with the training data and 99.55% with the test data. However, the accuracy of recognition is lower if complex documents contain both handwritten and printed text as well as non-textual information. This is due to the complexity of segmenting handwritten characters, which requires further research to identify more effective methods for preprocessing and segmentation. Impact on Society The proposed methodology allows a significant reduction of human factors in the process of extracting data from digitized documents, accelerating their processing and increasing the efficiency of government institutions. Future Research Further research may further explore the areas of automated processing of digitized documents, especially in other branches of the state administration, taking into account the specifics of data extracted for further processing.

Cite

CITATION STYLE

APA

Boliubash, N., & Yevtushenko, O. (2025). AUTOMATING DIGITIZED DOCUMENT PROCESSING WITH HANDWRITTEN DIGITS IN THE PUBLIC SECTOR USING CONVOLUTIONAL NEURAL NETWORKS. Interdisciplinary Journal of Information, Knowledge, and Management, 20. https://doi.org/10.28945/5527

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free