Morpheme-Based Neural Machine Translation Models for Low-Resource Fusion Languages

6Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.

Abstract

Neural approaches, which are currently state-of-the-art in many areas, have contributed significantly to the exciting advancements in machine translation. However, Neural Machine Translation (NMT) requires a substantial quantity and good quality parallel training data to train the best model. A large amount of training data, in turn, increases the underlying vocabulary exponentially. Therefore, several proposed methods have been devised for relatively limited vocabulary due to constraints of computing resources such as system memory. Encoding words as sequences of subword units for so-called open-vocabulary translation is an effective strategy for solving this problem. However, the conventional methods for splitting words into subwords focus on statistics-based approaches that mainly conform to agglutinative languages. In these languages, the morphemes have relatively clean boundaries. These methods still need to be thoroughly investigated for their applicability to fusion languages, which is the main focus of this article. Phonological and orthographic processes alter the borders of constituent morphemes of a word in fusion languages. Therefore, it makes it difficult to distinguish the actual morphemes that carry syntactic or semantic information from the word's surface form, the form of the word as it appears in the text. We, thus, resorted to a word segmentation method that segments words by restoring the altered morphemes. We also compared conventional and morpheme-based NMT subword models. We could prove that morpheme-based models outperform conventional subword models on a benchmark dataset.

Cite

CITATION STYLE

APA

Gezmu, A. M., & Nürnberger, A. (2023). Morpheme-Based Neural Machine Translation Models for Low-Resource Fusion Languages. ACM Transactions on Asian and Low-Resource Language Information Processing, 22(9). https://doi.org/10.1145/3610773

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free