Fine-grained Multi-lingual Disentangled Autoencoder for Language-agnostic Representation Learning

Zetian Wu; Sixing Lu; Zhongkai Sun; Chengyuan Ma; Zhengyang Zhao; Chenlei Guo

Conference ProceedingsOPEN ACCESS

Fine-grained Multi-lingual Disentangled Autoencoder for Language-agnostic Representation Learning

Wu Z
Lu S
Sun Z
et al.

MMNLU-22 2022 - Massively Multilingual Natural Language Understanding 2022, Proceedings (2022) 12-24

DOI: 10.18653/v1/2022.mmnlu-1.2

0Citations

15Readers

Abstract

Encoding both language-specific and language-agnostic information into a single high-dimensional space is a common practice of pre-trained Multi-lingual Language Models (pMLM). Such encoding has been shown to perform effectively on natural language tasks requiring semantics of the whole sentence (e.g., translation). However, its effectiveness appears to be limited on tasks requiring partial information of the utterance (e.g., multi-lingual entity retrieval, template retrieval, and semantic alignment). In this work, a novel Fine-grained Multilingual Disentangled Autoencoder (FMDA) is proposed to disentangle fine-grained semantic information from language-specific information in a multi-lingual setting. FMDA is capable of successfully extracting the disentangled template semantic and residual semantic representations. Experiments conducted on the MASSIVE dataset demonstrate that the disentangled encoding can boost each other during the training, thus consistently outperforming the original pMLM and the strong language disentanglement baseline on monolingual template retrieval and cross-lingual semantic retrieval tasks across multiple languages.

Cite

CITATION STYLE

APA

Wu, Z., Lu, S., Sun, Z., Ma, C., Zhao, Z., & Guo, C. (2022). Fine-grained Multi-lingual Disentangled Autoencoder for Language-agnostic Representation Learning. In MMNLU-22 2022 - Massively Multilingual Natural Language Understanding 2022, Proceedings (pp. 12–24). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.mmnlu-1.2

Fine-grained Multi-lingual Disentangled Autoencoder for Language-agnostic Representation Learning

Abstract

Cite

Register to see more suggestions