Understanding Pure Character-Based Neural Machine Translation: The Case of Translating Finnish into English

4Citations
Citations of this article
73Readers
Mendeley users who have this article in their library.

Abstract

Recent work has shown that deeper character-based neural machine translation (NMT) models can outperform subword-based models. However, it is still unclear what makes deeper character-based models successful. In this paper, we conduct an investigation into pure character-based models in the case of translating Finnish into English, including exploring the ability to learn word senses and morphological inflections and the attention mechanism. We demonstrate that word-level information is distributed over the entire character sequence rather than over a single character, and characters at different positions play different roles in learning linguistic knowledge. In addition, character-based models need more layers to encode word senses which explains why only deeper models outperform subword-based models. The attention distribution pattern shows that separators attract a lot of attention and we explore a sparse word-level attention to enforce character hidden states to capture the full word-level information. Experimental results show that the word-level attention with a single head results in 1.2 BLEU points drop.

Cite

CITATION STYLE

APA

Tang, G., Sennrich, R., & Nivre, J. (2020). Understanding Pure Character-Based Neural Machine Translation: The Case of Translating Finnish into English. In COLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference (pp. 4251–4262). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.coling-main.375

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free