The semi-supervised text classification (SSTC) task aims at training text classification models with a few labeled data and massive unlabeled data. Recent works achieve this task by pseudo-labeling methods that assign pseudo-labels to unlabeled data as additional supervision. However, these models may suffer from incorrect pseudo-labels caused by underfitting of decision boundaries and generating biased pseudo-labels on imbalanced data. We propose a prototype-guided semi-supervised model to address the above problems, which integrates a prototype-anchored contrasting strategy and a prototype-guided pseudo-labeling strategy. Particularly, the prototype-anchored constrasting constructs prototypes to cluster text representations with the same class, forcing them to be high-density distributed, thus alleviating the underfitting of decision boundaries. And the prototype-guided pseudo-labeling selects reliable pseudo-labeled data around prototypes based on data distribution, thus alleviating the bias from imbalanced data. Empirical results on 4 commonly-used datasets demonstrate that our model is effective and outperforms state-of-the-art methods.
CITATION STYLE
Yang, W., Zhang, R., Chen, J., Wang, L., & Kim, J. (2023). Prototype-Guided Pseudo Labeling for Semi-Supervised Text Classification. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 16369–16382). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.acl-long.904
Mendeley helps you to discover research relevant for your work.