Adversarial Domain Adaptation for Cross-lingual Information Retrieval with Multilingual BERT

Runchuan Wang; Zhao Zhang; Fuzhen Zhuang; Dehong Gao; Yi Wei; Qing He

Conference ProceedingsOPEN ACCESS

Adversarial Domain Adaptation for Cross-lingual Information Retrieval with Multilingual BERT

International Conference on Information and Knowledge Management, Proceedings (2021) 3498-3502

DOI: 10.1145/3459637.3482050

14Citations

20Readers

Get full text

Abstract

Transformer-based language models (e.g. BERT, RoBERT, GPT, etc) have shown remarkable performance in many natural language processing tasks and their multilingual variants make it easier to handle cross-lingual tasks without using machine translation system. In this paper, we apply multilingual BERT in cross-lingual information retrieval (CLIR) task with triplet loss to learn the relevance between queries and documents written in different languages. Moreover, we align the token embeddings from different languages via adversarial networks to help the language model to learn cross-lingual sentence representation. We achieve the state-of-the-art result on the newly published CLIR dataset: CLIRMatrix. Furthermore, we show that the adversarial multilingual BERT can also get the competitive result in the zero-shot setting in some specific languages when we are lack of CLIR training data in a specific language.

Author supplied keywords

Cite

CITATION STYLE

APA

Wang, R., Zhang, Z., Zhuang, F., Gao, D., Wei, Y., & He, Q. (2021). Adversarial Domain Adaptation for Cross-lingual Information Retrieval with Multilingual BERT. In International Conference on Information and Knowledge Management, Proceedings (pp. 3498–3502). Association for Computing Machinery. https://doi.org/10.1145/3459637.3482050

Adversarial Domain Adaptation for Cross-lingual Information Retrieval with Multilingual BERT

Abstract

Author supplied keywords

Cite

Register to see more suggestions