Data augmentation and semi-supervised learning for deep neural networks-based text classifier

Heereen Shim; Stijn Luca; DIetwig Lowet; Bart Vanrumste

Conference ProceedingsOPEN ACCESS

Data augmentation and semi-supervised learning for deep neural networks-based text classifier

Proceedings of the ACM Symposium on Applied Computing (2020) 1119-1126

DOI: 10.1145/3341105.3373992

15Citations

24Readers

Abstract

User feedback is essential for understanding user needs. In this paper, we use free-text obtained from a survey on sleep-related issues to build a deep neural networks-based text classifier. However, to train the deep neural networks model, a lot of labelled data is needed. To reduce manual data labelling, we propose a method which is a combination of data augmentation and pseudo-labelling: data augmentation is applied to labelled data to increase the size of the initial train set and then the trained model is used to annotate unlabelled data with pseudo-labels. The result shows that the model with the data augmentation achieves macro-averaged f1 score of 65.2% while using 4,300 training data, whereas the model without data augmentation achieves macro-averaged f1 score of 68.2% with around 14,000 training data. Furthermore, with the combination of pseudo-labelling, the model achieves macro-averaged f1 score of 62.7% with only using 1,400 training data with labels. In other words, with the proposed method we can reduce the amount of labelled data for training while achieving relatively good performance.

Author supplied keywords

Cite

CITATION STYLE

APA

Shim, H., Luca, S., Lowet, Di., & Vanrumste, B. (2020). Data augmentation and semi-supervised learning for deep neural networks-based text classifier. In Proceedings of the ACM Symposium on Applied Computing (pp. 1119–1126). Association for Computing Machinery. https://doi.org/10.1145/3341105.3373992

Data augmentation and semi-supervised learning for deep neural networks-based text classifier

Abstract

Author supplied keywords

Cite

Register to see more suggestions