Abstract
Multilingual models have been widely used for cross-lingual transfer to low-resource languages. However, the performance on these languages is hindered by their underrepresentation in the pretraining data. To alleviate this problem, we propose a novel multilingual training technique based on teacherstudent knowledge distillation. In this setting, we utilize monolingual teacher models optimized for their language. We use those teachers along with balanced (sub-sampled) data to distill the teachers knowledge into a single multilingual student. Our method outperforms standard training methods in lowresource languages and retains performance on high-resource languages.
Cite
CITATION STYLE
Limisiewicz, T., Malkin, D., & Stanovsky, G. (2023). You Can Have Your Data and Balance It Too: Towards Balanced and Efficient Multilingual Models. In SIGTYP 2023 - 5th Workshop on Research in Computational Linguistic Typology and Multilingual NLP, Proceedings of the Workshop (pp. 1–11). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.sigtyp-1.1
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.