Abstract
Few-shot cross-lingual transfer, fine-tuning Multilingual Masked Language Model (MMLM) with source language labeled data and a small amount of target language labeled data, provides excellent performance in the target language. However, if no labeled data in the target language are available, they need to be created through human annotations. In this study, we devise a metric to select annotation candidates from an unlabeled data pool that efficiently enhance accuracy for few-shot cross-lingual transfer. It is known that training a model with hard examples is important to improve the model's performance. Therefore, we first identify examples that MMLM cannot solve in a zero-shot cross-lingual transfer setting and demonstrate that it is hard to predict peculiar examples in the target language, i.e., the examples distant from the source language examples in cross-lingual semantic space of the MMLM. We then choose high peculiarity examples as annotation candidates and perform few-shot cross-lingual transfer. In comprehensive experiments with 20 languages and 6 tasks, we demonstrate that the high peculiarity examples improve the target language accuracy compared to other candidate selection methods proposed in previous studies. The code used in our experiments is available at https://github.com/hwichan0720/fewshot_transfer_with_peculiarity.
Cite
CITATION STYLE
Kim, H., & Komachi, M. (2023). Enhancing Few-shot Cross-lingual Transfer with Target Language Peculiar Examples. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 747–767). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-acl.47
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.