Building the tatar-Russian NMT system based on re-translation of multilingual data

Aidar Khusainov; Dzhavdet Suleymanov; Rinat Gilmullin; Ajrat Gatiatullin

Conference Proceedings

Building the tatar-Russian NMT system based on re-translation of multilingual data

Lecture Notes in Computer Science (2018) 11107 LNAI 163-170

DOI: 10.1007/978-3-030-00794-2_17

9Citations

4Readers

Get full text

Abstract

This paper assesses the possibility of combining the rule-based and the neural network approaches to the construction of the machine translation system for the Tatar-Russian language pair. We propose a rule-based system that allows using parallel data of a group of 6 Turkic languages (Tatar, Kazakh, Kyrgyz, Crimean-Tatar, Uzbek, Turkish) and the Russian language to overcome the problem of limited Tatar-Russian data. We incorporated modern approaches for data augmentation, neural networks training and linguistically motivated rule-based methods. The main results of the work are the creation of the first neural Tatar-Russian translation system and the improvement of the translation quality in this language pair in terms of BLEU scores from 12 to 39 and from 17 to 45 for both translation directions (comparing to the existing translation system). Also the translation between any of the Tatar, Kazakh, Kyrgyz, Crimean Tatar, Uzbek, Turkish languages becomes possible, which allows to translate from all of these Turkic languages into Russian using Tatar as an intermediate language.

Author supplied keywords

Cite

CITATION STYLE

APA

Khusainov, A., Suleymanov, D., Gilmullin, R., & Gatiatullin, A. (2018). Building the tatar-Russian NMT system based on re-translation of multilingual data. In Lecture Notes in Computer Science (Vol. 11107 LNAI, pp. 163–170). Springer Verlag. https://doi.org/10.1007/978-3-030-00794-2_17

Building the tatar-Russian NMT system based on re-translation of multilingual data

Abstract

Author supplied keywords

Cite

Register to see more suggestions