Towards textual data augmentation for neural networks: Synonyms and maximum loss

Michał Jungiewicz; Aleksander Smywiński-Pohl

Journal ArticleOPEN ACCESS

Towards textual data augmentation for neural networks: Synonyms and maximum loss

Computer Science (2019) 20(1) 57-84

DOI: 10.7494/csci.2019.20.1.3023

26Citations

25Readers

Get full text

Abstract

Data augmentation is one of the ways to deal with labeled data scarcity and overfitting. Both of these problems are crucial for modern deep-learning algo- rithms, which require massive amounts of data. The problem is better explored in the context of image analysis than for text; this work is a step forward to help close this gap. We propose a method for augmenting textual data when training convolutional neural networks for sentence classification. The aug- mentation is based on the substitution of words using a thesaurus as well as Princeton University's WordNet. Our method improves upon the baseline in most of the cases. In terms of accuracy, the best of the variants is 1.2% (pp.) better than the baseline.

Author supplied keywords

Cite

CITATION STYLE

APA

Jungiewicz, M., & Smywiński-Pohl, A. (2019). Towards textual data augmentation for neural networks: Synonyms and maximum loss. Computer Science, 20(1), 57–84. https://doi.org/10.7494/csci.2019.20.1.3023

Towards textual data augmentation for neural networks: Synonyms and maximum loss

Abstract

Author supplied keywords

Cite

Register to see more suggestions