Building the tatar-Russian NMT system based on re-translation of multilingual data

9Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This paper assesses the possibility of combining the rule-based and the neural network approaches to the construction of the machine translation system for the Tatar-Russian language pair. We propose a rule-based system that allows using parallel data of a group of 6 Turkic languages (Tatar, Kazakh, Kyrgyz, Crimean-Tatar, Uzbek, Turkish) and the Russian language to overcome the problem of limited Tatar-Russian data. We incorporated modern approaches for data augmentation, neural networks training and linguistically motivated rule-based methods. The main results of the work are the creation of the first neural Tatar-Russian translation system and the improvement of the translation quality in this language pair in terms of BLEU scores from 12 to 39 and from 17 to 45 for both translation directions (comparing to the existing translation system). Also the translation between any of the Tatar, Kazakh, Kyrgyz, Crimean Tatar, Uzbek, Turkish languages becomes possible, which allows to translate from all of these Turkic languages into Russian using Tatar as an intermediate language.

Cite

CITATION STYLE

APA

Khusainov, A., Suleymanov, D., Gilmullin, R., & Gatiatullin, A. (2018). Building the tatar-Russian NMT system based on re-translation of multilingual data. In Lecture Notes in Computer Science (Vol. 11107 LNAI, pp. 163–170). Springer Verlag. https://doi.org/10.1007/978-3-030-00794-2_17

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free