Data augmentation and semi-supervised learning for deep neural networks-based text classifier

15Citations
Citations of this article
24Readers
Mendeley users who have this article in their library.

Abstract

User feedback is essential for understanding user needs. In this paper, we use free-text obtained from a survey on sleep-related issues to build a deep neural networks-based text classifier. However, to train the deep neural networks model, a lot of labelled data is needed. To reduce manual data labelling, we propose a method which is a combination of data augmentation and pseudo-labelling: data augmentation is applied to labelled data to increase the size of the initial train set and then the trained model is used to annotate unlabelled data with pseudo-labels. The result shows that the model with the data augmentation achieves macro-averaged f1 score of 65.2% while using 4,300 training data, whereas the model without data augmentation achieves macro-averaged f1 score of 68.2% with around 14,000 training data. Furthermore, with the combination of pseudo-labelling, the model achieves macro-averaged f1 score of 62.7% with only using 1,400 training data with labels. In other words, with the proposed method we can reduce the amount of labelled data for training while achieving relatively good performance.

Cite

CITATION STYLE

APA

Shim, H., Luca, S., Lowet, Di., & Vanrumste, B. (2020). Data augmentation and semi-supervised learning for deep neural networks-based text classifier. In Proceedings of the ACM Symposium on Applied Computing (pp. 1119–1126). Association for Computing Machinery. https://doi.org/10.1145/3341105.3373992

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free