Improving Low-Resource Languages in Pre-Trained Multilingual Language Models

20Citations
Citations of this article
47Readers
Mendeley users who have this article in their library.

Abstract

Pre-trained multilingual language models are the foundation of many NLP approaches, including cross-lingual transfer solutions. However, languages with small available monolingual corpora are often not well-supported by these models leading to poor performance. We propose an unsupervised approach to improve the cross-lingual representations of low-resource languages by bootstrapping word translation pairs from monolingual corpora and using them to improve language alignment in pretrained language models. We perform experiments on nine languages, using contextual word retrieval and zero-shot named entity recognition to measure both intrinsic cross-lingual word representation quality and downstream task performance, showing improvements on both tasks. Our results show that it is possible to improve pre-trained multilingual language models by relying only on non-parallel resources.

Cite

CITATION STYLE

APA

Hangya, V., Saadi, H. S., & Fraser, A. (2022). Improving Low-Resource Languages in Pre-Trained Multilingual Language Models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 (pp. 11993–12006). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.emnlp-main.822

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free