PromDA: Prompt-based Data Augmentation for Low-Resource NLU Tasks

74Citations
Citations of this article
131Readers
Mendeley users who have this article in their library.

Abstract

This paper focuses on the Data Augmentation for low-resource Natural Language Understanding (NLU) tasks. We propose Prompt-based Data Augmentation model (PromDA) which only trains small-scale Soft Prompt (i.e., a set of trainable vectors) in the frozen Pre-trained Language Models (PLMs). This avoids human effort in collecting unlabeled in-domain data and maintains the quality of generated synthetic data. In addition, PromDA generates synthetic data via two different views and filters out the low-quality data using NLU models. Experiments on four benchmarks show that synthetic data produced by PromDA successfully boost up the performance of NLU models which consistently outperform several competitive baseline models, including a state-of-the-art semi-supervised model using unlabeled in-domain data. The synthetic data from PromDA are also complementary with unlabeled in-domain data. The NLU models can be further improved when they are combined for training.

Cite

CITATION STYLE

APA

Wang, Y., Xu, C., Sun, Q., Hu, H., Tao, C., Geng, X., & Jiang, D. (2022). PromDA: Prompt-based Data Augmentation for Low-Resource NLU Tasks. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 4242–4255). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.acl-long.292

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free