Cross-Lingual Retrieval Augmented Prompt for Low-Resource Languages

25Citations
Citations of this article
28Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Multilingual Pretrained Language Models (MPLMs) perform strongly in cross-lingual transfer. We propose Prompts Augmented by Retrieval Crosslingually (PARC) to improve zero-shot performance on low-resource languages (LRLs) by augmenting the context with prompts consisting of semantically similar sentences retrieved from a high-resource language (HRL). PARC improves zero-shot performance on three downstream tasks (sentiment classification, topic categorization, natural language inference) with multilingual parallel test sets across 10 LRLs covering 6 language families in unlabeled (+5.1%) and labeled settings (+16.3%). PARC also outperforms finetuning by 3.7%. We find a significant positive correlation between cross-lingual transfer performance on one side, and the similarity between high- and low-resource languages as well as the amount of low-resource pretraining data on the other side. A robustness analysis suggests that PARC has the potential to achieve even stronger performance with more powerful MPLMs.

Cite

CITATION STYLE

APA

Nie, E., Liang, S., Schmid, H., & Schütze, H. (2023). Cross-Lingual Retrieval Augmented Prompt for Low-Resource Languages. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 8320–8340). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-acl.528

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free