Substructure Substitution: Structured Data Augmentation for NLP

29Citations
Citations of this article
95Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We study a family of data augmentation methods, substructure substitution (SUB2), that generalizes prior methods. SUB2 generates new examples by substituting substructures (e.g., subtrees or subsequences) with others having the same label. This idea can be applied to many structured NLP tasks such as part-of-speech tagging and parsing. For more general tasks (e.g., text classification) which do not have explicitly annotated substructures, we present variations of SUB2 based on text spans or parse trees, introducing structure-aware data augmentation methods to general NLP tasks. For most cases, training with a dataset augmented by SUB2 achieves better performance than training with the original training set. Further experiments show that SUB2 has more consistent performance than other investigated augmentation methods, across different tasks and sizes of the seed dataset.

Cite

CITATION STYLE

APA

Shi, H., Livescu, K., & Gimpel, K. (2021). Substructure Substitution: Structured Data Augmentation for NLP. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (pp. 3494–3508). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.findings-acl.307

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free