Data augmentation is one of the ways to deal with labeled data scarcity and overfitting. Both of these problems are crucial for modern deep-learning algo- rithms, which require massive amounts of data. The problem is better explored in the context of image analysis than for text; this work is a step forward to help close this gap. We propose a method for augmenting textual data when training convolutional neural networks for sentence classification. The aug- mentation is based on the substitution of words using a thesaurus as well as Princeton University's WordNet. Our method improves upon the baseline in most of the cases. In terms of accuracy, the best of the variants is 1.2% (pp.) better than the baseline.
CITATION STYLE
Jungiewicz, M., & Smywiński-Pohl, A. (2019). Towards textual data augmentation for neural networks: Synonyms and maximum loss. Computer Science, 20(1), 57–84. https://doi.org/10.7494/csci.2019.20.1.3023
Mendeley helps you to discover research relevant for your work.