You Can Have Your Data and Balance It Too: Towards Balanced and Efficient Multilingual Models

3Citations
Citations of this article
22Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Multilingual models have been widely used for cross-lingual transfer to low-resource languages. However, the performance on these languages is hindered by their underrepresentation in the pretraining data. To alleviate this problem, we propose a novel multilingual training technique based on teacherstudent knowledge distillation. In this setting, we utilize monolingual teacher models optimized for their language. We use those teachers along with balanced (sub-sampled) data to distill the teachers knowledge into a single multilingual student. Our method outperforms standard training methods in lowresource languages and retains performance on high-resource languages.

Cite

CITATION STYLE

APA

Limisiewicz, T., Malkin, D., & Stanovsky, G. (2023). You Can Have Your Data and Balance It Too: Towards Balanced and Efficient Multilingual Models. In SIGTYP 2023 - 5th Workshop on Research in Computational Linguistic Typology and Multilingual NLP, Proceedings of the Workshop (pp. 1–11). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.sigtyp-1.1

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free