Towards textual data augmentation for neural networks: Synonyms and maximum loss

26Citations
Citations of this article
25Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Data augmentation is one of the ways to deal with labeled data scarcity and overfitting. Both of these problems are crucial for modern deep-learning algo- rithms, which require massive amounts of data. The problem is better explored in the context of image analysis than for text; this work is a step forward to help close this gap. We propose a method for augmenting textual data when training convolutional neural networks for sentence classification. The aug- mentation is based on the substitution of words using a thesaurus as well as Princeton University's WordNet. Our method improves upon the baseline in most of the cases. In terms of accuracy, the best of the variants is 1.2% (pp.) better than the baseline.

Cite

CITATION STYLE

APA

Jungiewicz, M., & Smywiński-Pohl, A. (2019). Towards textual data augmentation for neural networks: Synonyms and maximum loss. Computer Science, 20(1), 57–84. https://doi.org/10.7494/csci.2019.20.1.3023

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free