The problem of text classification has been a mainstream research branch in natural language processing, and how to improve the effect of classification under the scarcity of labeled samples is one of the hot issues in this direction. The current models supporting small-sample classification can learn knowledge and train models with a small number of labels, but the classification results are not satisfactory enough. In order to improve the classification accuracy, we propose a Small-sample Text Classification model based on the Pseudo-label fusion Clustering algorithm (STCPC). The algorithm includes two cores: (1) Mining the potential features of unlabeled data by using the training strategy of clustering assuming pseudo-labeling and then reducing the noise of the pseudo-labeled dataset by consistent training with its enhanced samples to improve the quality of the pseudo-labeled dataset. (2) The labeled data is augmented, and then the Easy Plug-in Data Augmentation (EPiDA) framework is used to balance the diversity and quality of the augmented samples to improve the richness of the labeled data reasonably. The results of comparison tests with other classical algorithms show that the STCPC model can effectively improve classification accuracy.
CITATION STYLE
Yang, L., Huang, B., Guo, S., Lin, Y., & Zhao, T. (2023). A Small-Sample Text Classification Model Based on Pseudo-Label Fusion Clustering Algorithm. Applied Sciences (Switzerland), 13(8). https://doi.org/10.3390/app13084716
Mendeley helps you to discover research relevant for your work.