Word transduction for addressing the OOV problem in machine translation for similar resource-scarce languages

1Citations
Citations of this article
61Readers
Mendeley users who have this article in their library.

Abstract

Similar languages have a large number of cognate words which can be exploited to deal with Out-Of-Vocabulary (OOV) words problem. This problem is especially severe for resource-scarce languages. We propose a method for 'word transduction' for addressing this problem. We take advantage of the fact that, although it is difficult to prepare sentence aligned parallel corpus for such languages, it is much easier to prepare 'parallel' list of word pairs which are cognates and have similar pronunciations. We can try to learn pronunciations (or orthographic representations) of OOV words from such a parallel list. This could be done by using phrase-based machine translation (PBMT). We show that, for small amount of data, a model based on weighted rewrite rules for phoneme chunks outperforms a PBMT-based approach. An additional point that we make is that word transduction can also be used to borrow words from another similar language and adapt them to the phonology of the target language.

Cite

CITATION STYLE

APA

Sharma, S., & Singh, A. K. (2021). Word transduction for addressing the OOV problem in machine translation for similar resource-scarce languages. In Proceedings of the 13th International Conference on Finite State Methods and Natural Language Processing, FSMNLP 2017 (pp. 56–63). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/W17-4007

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free