Evaluation of Deep Learning Techniques for Content Extraction in Spanish Colonial Notary Records

10Citations
Citations of this article
14Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Processing and analyzing historical manuscripts is considered one of the most challenging problems in the document analysis and recognition domain. Manuscripts written in cursive are even more difficult due to overlapping words with random spacing, irregular and varying characters' shapes, poor scan quality, and insufficient labeled data. Despite the significant achievements of deep learning approaches in computer vision, handwritten word recognition is far from solved. Most of the existing methods focus on well-segmented word datasets. In this paper, we present an empirical study investigating how well state-of-the-art deep learning models perform on detection and recognition of handwritten words in Spanish American notary records. Professional historians were involved in preparing a labeled dataset of 26,482 Spanish words employed in the experiments. We investigate the performance of some state-of-the-art models on optical character recognition (OCR) on handwritten text documents: Keras-OCR, the object detection algorithm "You Only Look Once"(YOLO), Tesseract OCR, Kraken, and Calamari-OCR. Since YOLO does not include a text recognizer, we propose YOLO-OCR, an innovative model to detect and recognize words in historical manuscripts written in Spanish. Our results show the performance of pre-trained models on our dataset and that Keras-OCR and YOLO-OCR models are highly valuable for content extraction.

Cite

CITATION STYLE

APA

Alrasheed, N., Prasanna, S., Rowland, R., Rao, P., Grieco, V., & Wasserman, M. (2021). Evaluation of Deep Learning Techniques for Content Extraction in Spanish Colonial Notary Records. In SUMAC 2021 - Proceedings of the 3rd Workshop on Structuring and Understanding of Multimedia heritAge Contents, co-located with ACM MM 2021 (pp. 23–30). Association for Computing Machinery, Inc. https://doi.org/10.1145/3475720.3484443

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free