Lexicon stratification for translating out-of-vocabulary words

21Citations
Citations of this article
106Readers
Mendeley users who have this article in their library.
Get full text

Abstract

A language lexicon can be divided into four main strata, depending on origin of words: core vocabulary words, fully-and partially-assimilated foreign words, and unassim-ilated foreign words (or transliterations). This paper focuses on translation of fully-and partially-assimilated foreign words, called "borrowed words". Borrowed words (or loanwords) are content words found in nearly all languages, occupying up to 70% of the vocabulary. We use models of lexi-cal borrowing in machine translation as a pivoting mechanism to obtain translations of out-of-vocabulary loanwords in a low-resource language. Our framework obtains substantial improvements (up to 1.6 BLEU) over standard baselines.

Cite

CITATION STYLE

APA

Tsvetkov, Y., & Dyer, C. (2015). Lexicon stratification for translating out-of-vocabulary words. In ACL-IJCNLP 2015 - 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Proceedings of the Conference (Vol. 2, pp. 125–131). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/p15-2021

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free