English Contrastive Learning Can Learn Universal Cross-lingual Sentence Embeddings

Yau Shian Wang; Ashley Wu; Graham Neubig

Conference ProceedingsOPEN ACCESS

English Contrastive Learning Can Learn Universal Cross-lingual Sentence Embeddings

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 (2022) 9122-9133

DOI: 10.18653/v1/2022.emnlp-main.621

23Citations

37Readers

Abstract

Universal cross-lingual sentence embeddings map semantically similar cross-lingual sentences into a shared embedding space. Aligning cross-lingual sentence embeddings usually requires supervised cross-lingual parallel sentences. In this work, we propose mSimCSE, which extends SimCSE (Gao et al., 2021) to multilingual settings and reveal that contrastive learning on English data can surprisingly learn high-quality universal cross-lingual sentence embeddings without any parallel data. In unsupervised and weakly supervised settings, mSimCSE significantly improves previous sentence embedding methods on cross-lingual retrieval and multilingual STS tasks. The performance of unsupervised mSimCSE is comparable to fully supervised methods in retrieving low-resource languages and multilingual STS. The performance can be further enhanced when cross-lingual NLI data is available.

Cite

CITATION STYLE

APA

Wang, Y. S., Wu, A., & Neubig, G. (2022). English Contrastive Learning Can Learn Universal Cross-lingual Sentence Embeddings. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 (pp. 9122–9133). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.emnlp-main.621

English Contrastive Learning Can Learn Universal Cross-lingual Sentence Embeddings

Abstract

Cite

Register to see more suggestions