The transcription of handwritten documents is useful to make their contents accessible to the general public. However, so far automatic transcription of historical documents has mostly focused on producing diplomatic transcripts, even if such transcripts are often only understandable by experts. Main difficulties come from the heavy use of extremely abridged and tangled abbreviations and archaic or outdated word forms. Here we study different approaches to train optical models which allow to recognize historic document images containing archaic and abbreviated handwritten text and produce modernized transcripts with expanded abbreviations. Experiments comparing the performance of the different approaches proposed are carried out on a document collection related with Spanish naval commerce during the XV–XIX centuries, which includes extremely difficult handwritten text images.
CITATION STYLE
Romero, V., Toselli, A. H., Vidal, E., Sánchez, J. A., Alonso, C., & Marqués, L. (2019). Modern vs Diplomatic Transcripts for Historical Handwritten Text Recognition. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11808 LNCS, pp. 103–114). Springer Verlag. https://doi.org/10.1007/978-3-030-30754-7_11
Mendeley helps you to discover research relevant for your work.