Larger-Scale Transformers for Multilingual Masked Language Modeling

41Citations
Citations of this article
139Readers
Mendeley users who have this article in their library.

Abstract

Recent work has demonstrated the effectiveness of cross-lingual language model pretraining for cross-lingual understanding. In this study, we present the results of two larger multilingual masked language models, with 3.5B and 10.7B parameters. Our two new models dubbed XLM-RXL and XLM-RXXL outperform XLM-R by 1.8% and 2.4% average accuracy on XNLI. Our model also outperforms the RoBERTa-Large model on several English tasks of the GLUE benchmark by 0.3% on average while handling 99 more languages. This suggests larger capacity models for language understanding may obtain strong performance on both high- and low-resource languages. We make our code and models publicly available.

Cite

CITATION STYLE

APA

Goyal, N., Du, J., Ott, M., Anantharaman, G., & Conneau, A. (2021). Larger-Scale Transformers for Multilingual Masked Language Modeling. In RepL4NLP 2021 - 6th Workshop on Representation Learning for NLP, Proceedings of the Workshop (pp. 29–33). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.repl4nlp-1.4

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free