Larger-Scale Transformers for Multilingual Masked Language Modeling

Naman Goyal; Jingfei Du; Myle Ott; Giri Anantharaman; Alexis Conneau

Conference ProceedingsOPEN ACCESS

Larger-Scale Transformers for Multilingual Masked Language Modeling

RepL4NLP 2021 - 6th Workshop on Representation Learning for NLP, Proceedings of the Workshop (2021) 29-33

DOI: 10.18653/v1/2021.repl4nlp-1.4

41Citations

139Readers

Abstract

Recent work has demonstrated the effectiveness of cross-lingual language model pretraining for cross-lingual understanding. In this study, we present the results of two larger multilingual masked language models, with 3.5B and 10.7B parameters. Our two new models dubbed XLM-RXL and XLM-RXXL outperform XLM-R by 1.8% and 2.4% average accuracy on XNLI. Our model also outperforms the RoBERTa-Large model on several English tasks of the GLUE benchmark by 0.3% on average while handling 99 more languages. This suggests larger capacity models for language understanding may obtain strong performance on both high- and low-resource languages. We make our code and models publicly available.

Cite

CITATION STYLE

APA

Goyal, N., Du, J., Ott, M., Anantharaman, G., & Conneau, A. (2021). Larger-Scale Transformers for Multilingual Masked Language Modeling. In RepL4NLP 2021 - 6th Workshop on Representation Learning for NLP, Proceedings of the Workshop (pp. 29–33). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.repl4nlp-1.4

Larger-Scale Transformers for Multilingual Masked Language Modeling

Abstract

Cite

Register to see more suggestions