UXLA: A robust unsupervised data augmentation framework for zero-resource cross-lingual NLP

M. Saiful Bari; Tasnim Mohiuddin; Shafiq Joty

Conference ProceedingsOPEN ACCESS

UXLA: A robust unsupervised data augmentation framework for zero-resource cross-lingual NLP

ACL-IJCNLP 2021 - 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Proceedings of the Conference (2021) 1978-1992

DOI: 10.18653/v1/2021.acl-long.154

15Citations

78Readers

Abstract

Transfer learning has yielded state-of-the-art (SoTA) results in many supervised NLP tasks. However, annotated data for every target task in every target language is rare, especially for low-resource languages. We propose UXLA a novel unsupervised data augmentation framework for zero-resource transfer learning scenarios. In particular, UXLA aims to solve cross-lingual adaptation problems from a source language task distribution to an unknown target language task distribution, assuming no training label in the target language. At its core, UXLA performs simultaneous self-training with data augmentation and unsupervised sample selection. To show its effectiveness, we conduct extensive experiments on three diverse zero-resource cross-lingual transfer tasks. UXLA achieves SoTA results in all the tasks, outperforming the baselines by a good margin. With an in-depth framework dissection, we demonstrate the cumulative contributions of different components to its success.

Cite

CITATION STYLE

APA

Bari, M. S., Mohiuddin, T., & Joty, S. (2021). UXLA: A robust unsupervised data augmentation framework for zero-resource cross-lingual NLP. In ACL-IJCNLP 2021 - 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Proceedings of the Conference (pp. 1978–1992). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.acl-long.154

UXLA: A robust unsupervised data augmentation framework for zero-resource cross-lingual NLP

Abstract

Cite

Register to see more suggestions