Adversarial Domain Adaptation for Cross-lingual Information Retrieval with Multilingual BERT

14Citations
Citations of this article
20Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Transformer-based language models (e.g. BERT, RoBERT, GPT, etc) have shown remarkable performance in many natural language processing tasks and their multilingual variants make it easier to handle cross-lingual tasks without using machine translation system. In this paper, we apply multilingual BERT in cross-lingual information retrieval (CLIR) task with triplet loss to learn the relevance between queries and documents written in different languages. Moreover, we align the token embeddings from different languages via adversarial networks to help the language model to learn cross-lingual sentence representation. We achieve the state-of-the-art result on the newly published CLIR dataset: CLIRMatrix. Furthermore, we show that the adversarial multilingual BERT can also get the competitive result in the zero-shot setting in some specific languages when we are lack of CLIR training data in a specific language.

Cite

CITATION STYLE

APA

Wang, R., Zhang, Z., Zhuang, F., Gao, D., Wei, Y., & He, Q. (2021). Adversarial Domain Adaptation for Cross-lingual Information Retrieval with Multilingual BERT. In International Conference on Information and Knowledge Management, Proceedings (pp. 3498–3502). Association for Computing Machinery. https://doi.org/10.1145/3459637.3482050

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free