Word transduction for addressing the OOV problem in machine translation for similar resource-scarce languages

Shashikant Sharma; Anil Kumar Singh

Conference ProceedingsOPEN ACCESS

Word transduction for addressing the OOV problem in machine translation for similar resource-scarce languages

Proceedings of the 13th International Conference on Finite State Methods and Natural Language Processing, FSMNLP 2017 (2021) 56-63

DOI: 10.18653/v1/W17-4007

1Citations

70Readers

Abstract

Similar languages have a large number of cognate words which can be exploited to deal with Out-Of-Vocabulary (OOV) words problem. This problem is especially severe for resource-scarce languages. We propose a method for 'word transduction' for addressing this problem. We take advantage of the fact that, although it is difficult to prepare sentence aligned parallel corpus for such languages, it is much easier to prepare 'parallel' list of word pairs which are cognates and have similar pronunciations. We can try to learn pronunciations (or orthographic representations) of OOV words from such a parallel list. This could be done by using phrase-based machine translation (PBMT). We show that, for small amount of data, a model based on weighted rewrite rules for phoneme chunks outperforms a PBMT-based approach. An additional point that we make is that word transduction can also be used to borrow words from another similar language and adapt them to the phonology of the target language.

Cite

CITATION STYLE

APA

Sharma, S., & Singh, A. K. (2021). Word transduction for addressing the OOV problem in machine translation for similar resource-scarce languages. In Proceedings of the 13th International Conference on Finite State Methods and Natural Language Processing, FSMNLP 2017 (pp. 56–63). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/W17-4007

Word transduction for addressing the OOV problem in machine translation for similar resource-scarce languages

Abstract

Cite

Register to see more suggestions