Finding words that aren’t there: Using word embeddings to improve dictionary search for low-resource languages

0Citations
Citations of this article
17Readers
Mendeley users who have this article in their library.

Abstract

Modern machine learning techniques have produced many impressive results in language technology, but these techniques generally require an amount of training data that is many orders of magnitude greater than what exists for low-resource languages in general, and endangered languages in particular. However, dictionary definitions in a comparatively much more well-resourced majority language can provide a link between low-resource languages and machine learning models trained on massive amounts of majority-language training data. Promising results have been achieved by leveraging these embeddings in the search mechanisms of bilingual dictionaries of Plains Cree (nêhiyawêwin), Arapaho (Hinóno’éitíit), Northern Haida (Xaad Kíl), and Tsuut’ina (Tsúùt’ínà), four Indigenous languages spoken in North America. Not only are the search results in the majority language of the definitions more relevant, but they can be semantically relevant in ways not achievable with classic information retrieval techniques: users can perform successful searches for words that do not occur at all in the dictionary. Not only this, but these techniques are directly applicable to any bilingual dictionary providing translations between a high- and low-resource language.

Cite

CITATION STYLE

APA

Arppe, A., Neitsch, A., Dacanay, D. B., Poulin, J., Hieber, D. W., & Harrigan, A. G. (2023). Finding words that aren’t there: Using word embeddings to improve dictionary search for low-resource languages. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 144–155). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.americasnlp-1.15

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free