A comparison of deep transfer learning backbone architecture techniques for printed text detection of different font styles from unstructured documents

Supriya Mahadevkar; Shruti Patil; Ketan Kotecha; Ajith Abraham

Journal ArticleOPEN ACCESS

A comparison of deep transfer learning backbone architecture techniques for printed text detection of different font styles from unstructured documents

PeerJ Computer Science (2024) 10

DOI: 10.7717/peerj-cs.1769

1Citations

9Readers

Get full text

Abstract

Object detection methods based on deep learning have been used in a variety of sectors including banking, healthcare, e-governance, and academia. In recent years, there has been a lot of attention paid to research endeavors made towards text detection and recognition from different scenesor images of unstructured document processing. The article’s novelty lies in the detailed discussion and implementation of the various transfer learning-based different backbone architectures for printed text recognition. In this research article, the authors compared the ResNet50, ResNet50V2, ResNet152V2, Inception, Xception, and VGG19 backbone architectures with preprocessing techniques as data resizing, normalization, and noise removal on a standard OCR Kaggle dataset. Further, the top three backbone architectures selected based on the accuracy achieved and then hyper parameter tunning has been performed to achieve more accurate results. Xception performed well compared with the ResNet, Inception, VGG19, MobileNet architectures by achieving high evaluation scores with accuracy (98.90%) and min loss (0.19). As per existing research in this domain, until now, transfer learningbased backbone architectures that have been used on printed or handwritten data recognition are not well represented in literature. We split the total dataset into 80 percent for training and 20 percent for testing purpose and then into different backbone architecture models with the same number of epochs, and found that the Xception architecture achieved higher accuracy than the others. In addition, the ResNet50V2 model gave us higher accuracy (96.92%) than the ResNet152V2 model (96.34%).

Author supplied keywords

Cite

CITATION STYLE

APA

Mahadevkar, S., Patil, S., Kotecha, K., & Abraham, A. (2024). A comparison of deep transfer learning backbone architecture techniques for printed text detection of different font styles from unstructured documents. PeerJ Computer Science, 10. https://doi.org/10.7717/peerj-cs.1769

A comparison of deep transfer learning backbone architecture techniques for printed text detection of different font styles from unstructured documents

Abstract

Author supplied keywords

Cite

Register to see more suggestions