Multilingual Pre-training with Self-supervision from Global Co-occurrence

ISSN: 0736587X
2Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.

Abstract

Global co-occurrence information is the primary source of structural information on multilingual corpora, and we find that analogical/parallel compound words across languages have similar co-occurrence counts/frequencies (normalized) giving weak but stable self-supervision for cross-lingual transfer. Following the observation, we aim at associating contextualized representations with relevant (contextualized) representations across languages with the help of co-occurrence counts. The result is MLM-GC (MLM with Global Co-occurrence) pre-training that the model learns local bidirectional information from MLM and global co-occurrence information from a log-bilinear regression. Experiments show that MLM-GC pre-training substantially outperforms MLM pre-training for 4 downstream cross-lingual tasks and 1 additional monolingual task, showing the advantages of forming isomorphic spaces across languages.

Cite

CITATION STYLE

APA

Ai, X., & Fang, B. (2023). Multilingual Pre-training with Self-supervision from Global Co-occurrence. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 7526–7543). Association for Computational Linguistics (ACL).

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free