Abstract
A language lexicon can be divided into four main strata, depending on origin of words: core vocabulary words, fully-and partially-assimilated foreign words, and unassim-ilated foreign words (or transliterations). This paper focuses on translation of fully-and partially-assimilated foreign words, called "borrowed words". Borrowed words (or loanwords) are content words found in nearly all languages, occupying up to 70% of the vocabulary. We use models of lexi-cal borrowing in machine translation as a pivoting mechanism to obtain translations of out-of-vocabulary loanwords in a low-resource language. Our framework obtains substantial improvements (up to 1.6 BLEU) over standard baselines.
Cite
CITATION STYLE
Tsvetkov, Y., & Dyer, C. (2015). Lexicon stratification for translating out-of-vocabulary words. In ACL-IJCNLP 2015 - 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Proceedings of the Conference (Vol. 2, pp. 125–131). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/p15-2021
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.