The nature of human language and the lack of a spelling convention make historical documents hard to handle for natural language processing. Spelling normalization tackles this problem by adapting their spelling to modern standards in order to get an orthography consistency. In this work, we compare several character-based machine translation approaches, and propose a method to profit from modern documents to enrich neural machine translation models. We tested our proposal with four different data sets, and observed that the enriched models successfully improved the normalization quality of the neural models. Statistical models, however, yielded a better result.
CITATION STYLE
Domingo, M., & Casacuberta, F. (2019). Enriching Character-Based Neural Machine Translation with Modern Documents for Achieving an Orthography Consistency in Historical Documents. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11808 LNCS, pp. 59–69). Springer Verlag. https://doi.org/10.1007/978-3-030-30754-7_7
Mendeley helps you to discover research relevant for your work.