Naive regularizers for low-resource neural machine translation

Meriem Beloucif; Ana Valeria Gonzalez; Marcel Bollmann; Anders Søgaard

Conference ProceedingsOPEN ACCESS

Naive regularizers for low-resource neural machine translation

International Conference Recent Advances in Natural Language Processing, RANLP (2019) 2019-September 102-111

DOI: 10.26615/978-954-452-056-4_013

1Citations

68Readers

Abstract

Neural machine translation models have little inductive bias, which can be a disadvantage in low-resource scenarios. They require large volumes of data and often perform poorly when limited data is available. We show that using naive regularization methods, based on sentence length, punctuation and word frequencies, to penalize translations that are very different from the input sentences, consistently improves the translation quality across multiple low-resource languages. We experiment with 12 language pairs, varying the training data size between 17k to 230k sentence pairs. Our best regularizer achieves an average increase of 1.5 BLEU score and 1.0 TER score across all the language pairs. For example, we achieve a BLEU score of 26.70 on the IWSLT15 English-Vietnamese translation task simply by using relative differences in punctuation as a regularizer.

Cite

CITATION STYLE

APA

Beloucif, M., Gonzalez, A. V., Bollmann, M., & Søgaard, A. (2019). Naive regularizers for low-resource neural machine translation. In International Conference Recent Advances in Natural Language Processing, RANLP (Vol. 2019-September, pp. 102–111). Incoma Ltd. https://doi.org/10.26615/978-954-452-056-4_013

Naive regularizers for low-resource neural machine translation

Abstract

Cite

Register to see more suggestions