Experimenting with different machine translation models in medium-resource settings

Haukur Páll Jónsson; Haukur Barri Símonarson; Vésteinn Snæbjarnarson; Steinþór Steingrímsson; Hrafn Loftsson

Conference Proceedings

Experimenting with different machine translation models in medium-resource settings

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2020) 12284 LNAI 95-103

DOI: 10.1007/978-3-030-58323-1_10

7Citations

8Readers

Get full text

Abstract

State-of-the-art machine translation (MT) systems rely on the availability of large parallel corpora, containing millions of sentence pairs. For the Icelandic language, the parallel corpus ParIce exists, consisting of about 3.6 million English-Icelandic sentence pairs. Given that parallel corpora for low-resource languages typically contain sentence pairs in the tens or hundreds of thousands, we classify Icelandic as a medium-resource language for MT purposes. In this paper, we present on-going experiments with different MT models, both statistical and neural, for translating English to Icelandic based on ParIce. We describe the corpus and the filtering process used for removing noisy segments, the different models used for training, and the preliminary automatic and human evaluation. We find that, while using an aggressive filtering approach, the most recent neural MT system (Transformer) performs best, obtaining the highest BLEU score and the highest fluency and adequacy scores from human evaluation for in-domain translation. Our work could be beneficial to other languages for which a similar amount of parallel data is available.

Author supplied keywords

Cite

CITATION STYLE

APA

Jónsson, H. P., Símonarson, H. B., Snæbjarnarson, V., Steingrímsson, S., & Loftsson, H. (2020). Experimenting with different machine translation models in medium-resource settings. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12284 LNAI, pp. 95–103). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-58323-1_10

Experimenting with different machine translation models in medium-resource settings

Abstract

Author supplied keywords

Cite

Register to see more suggestions