Abstract
Data augmentation has been widely used in low-resource NER tasks to tackle the problem of data sparsity. However, previous data augmentation methods have the disadvantages of disrupted syntactic structures, token-label mismatch, and requirement for external knowledge or manual effort. To address these issues, we propose Robust Prompt-based Data Augmentation (RoPDA) for low-resource NER. Based on pre-trained language models (PLMs) with continuous prompt, RoPDA performs entity augmentation and context augmentation through five fundamental augmentation operations to generate label-flipping and label-preserving examples. To optimize the utilization of the augmented samples, we present two techniques: self-consistency filtering and mixup. The former effectively eliminates low-quality samples with a bidirectional mask, while the latter prevents performance degradation arising from the direct utilization of label-flipping samples. Extensive experiments on three popular benchmarks from different domains demonstrate that RoPDA significantly improves upon strong baselines, and also outperforms state-of-the-art semi-supervised learning methods when unlabeled data is included.
Cite
CITATION STYLE
Song, S., Shen, F., & Zhao, J. (2024). RoPDA: Robust Prompt-Based Data Augmentation for Low-Resource Named Entity Recognition. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 38, pp. 19017–19025). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v38i17.29868
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.