Similar languages have a large number of cognate words which can be exploited to deal with Out-Of-Vocabulary (OOV) words problem. This problem is especially severe for resource-scarce languages. We propose a method for 'word transduction' for addressing this problem. We take advantage of the fact that, although it is difficult to prepare sentence aligned parallel corpus for such languages, it is much easier to prepare 'parallel' list of word pairs which are cognates and have similar pronunciations. We can try to learn pronunciations (or orthographic representations) of OOV words from such a parallel list. This could be done by using phrase-based machine translation (PBMT). We show that, for small amount of data, a model based on weighted rewrite rules for phoneme chunks outperforms a PBMT-based approach. An additional point that we make is that word transduction can also be used to borrow words from another similar language and adapt them to the phonology of the target language.
CITATION STYLE
Sharma, S., & Singh, A. K. (2021). Word transduction for addressing the OOV problem in machine translation for similar resource-scarce languages. In Proceedings of the 13th International Conference on Finite State Methods and Natural Language Processing, FSMNLP 2017 (pp. 56–63). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/W17-4007
Mendeley helps you to discover research relevant for your work.