Multimodal output combination for transcribing historical handwritten documents

Emilio Granell; Carlos D. Martínez-Hinarejos

Conference Proceedings

Multimodal output combination for transcribing historical handwritten documents

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2015) 9256 246-260

DOI: 10.1007/978-3-319-23192-1_21

8Citations

1Readers

Get full text

Abstract

Transcription of digitalised historical documents is an interesting task in the document analysis area. This transcription can be achieved by using Handwritten Text Recognition (HTR) on digitalized pages or by using Automatic Speech Recognition (ASR) on the dictation of contents. Moreover, another option is using both systems in a multimodal combination to obtain a draft transcription, given that combining the outputs of different recognition systems will generally improve the recognition accuracy. In this work, we present a new combination method based on Confusion Network. We check its effectiveness for transcribing a Spanish historical book. Results on both unimodal combination with different optical (for HTR) and acoustic (for ASR) models, and multimodal combination, show a relative reduction of Word and Character Error Rate of 14.3% and 16.6%, respectively, over the HTR baseline.

Author supplied keywords

Cite

CITATION STYLE

APA

Granell, E., & Martínez-Hinarejos, C. D. (2015). Multimodal output combination for transcribing historical handwritten documents. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9256, pp. 246–260). Springer Verlag. https://doi.org/10.1007/978-3-319-23192-1_21

Multimodal output combination for transcribing historical handwritten documents

Abstract

Author supplied keywords

Cite

Register to see more suggestions