Cross-Lingual Ability of Multilingual Masked Language Models: A Study of Language Structure

9Citations
Citations of this article
45Readers
Mendeley users who have this article in their library.

Abstract

Multilingual pre-trained language models, such as mBERT and XLM-R, have shown impressive cross-lingual ability. Surprisingly, both of them use multilingual masked language model (MLM) without any cross-lingual supervision or aligned data. Despite the encouraging results, we still lack a clear understanding of why cross-lingual ability could emerge from multilingual MLM. In our work, we argue that cross-language ability comes from the commonality between languages. Specifically, we study three language properties: constituent order, composition and word co-occurrence. First, we create an artificial language by modifying property in source language. Then we study the contribution of modified property through the change of cross-language transfer results on target language. We conduct experiments on six languages and two cross-lingual NLP tasks (textual entailment, sentence retrieval). Our main conclusion is that the contribution of constituent order and word co-occurrence is limited, while the composition is more crucial to the success of cross-linguistic transfer.

Cite

CITATION STYLE

APA

Chai, Y., Liang, Y., & Duan, N. (2022). Cross-Lingual Ability of Multilingual Masked Language Models: A Study of Language Structure. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 4702–4712). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.acl-long.322

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free