Text Smoothing: Enhance Various Data Augmentation Methods on Text Classification Tasks

24Citations
Citations of this article
72Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Before entering the neural network, a token is generally converted to the corresponding one-hot representation, which is a discrete distribution of the vocabulary. Smoothed representation is the probability of candidate tokens obtained from a pre-trained masked language model, which can be seen as a more informative substitution to the one-hot representation. We propose an efficient data augmentation method, termed text smoothing, by converting a sentence from its one-hot representation to a controllable smoothed representation. We evaluate text smoothing on different benchmarks in a low-resource regime. Experimental results show that text smoothing outperforms various mainstream data augmentation methods by a substantial margin. Moreover, text smoothing can be combined with those data augmentation methods to achieve better performance. Our code are available at https://github.com/caskcsg/TextSmoothing.

Cite

CITATION STYLE

APA

Wu, X., Gao, C., Lin, M., Zang, L., & Hu, S. (2022). Text Smoothing: Enhance Various Data Augmentation Methods on Text Classification Tasks. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 2, pp. 871–875). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.acl-short.97

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free