User feedback is essential for understanding user needs. In this paper, we use free-text obtained from a survey on sleep-related issues to build a deep neural networks-based text classifier. However, to train the deep neural networks model, a lot of labelled data is needed. To reduce manual data labelling, we propose a method which is a combination of data augmentation and pseudo-labelling: data augmentation is applied to labelled data to increase the size of the initial train set and then the trained model is used to annotate unlabelled data with pseudo-labels. The result shows that the model with the data augmentation achieves macro-averaged f1 score of 65.2% while using 4,300 training data, whereas the model without data augmentation achieves macro-averaged f1 score of 68.2% with around 14,000 training data. Furthermore, with the combination of pseudo-labelling, the model achieves macro-averaged f1 score of 62.7% with only using 1,400 training data with labels. In other words, with the proposed method we can reduce the amount of labelled data for training while achieving relatively good performance.
CITATION STYLE
Shim, H., Luca, S., Lowet, Di., & Vanrumste, B. (2020). Data augmentation and semi-supervised learning for deep neural networks-based text classifier. In Proceedings of the ACM Symposium on Applied Computing (pp. 1119–1126). Association for Computing Machinery. https://doi.org/10.1145/3341105.3373992
Mendeley helps you to discover research relevant for your work.