A Small-Sample Text Classification Model Based on Pseudo-Label Fusion Clustering Algorithm

Linda Yang; Baohua Huang; Shiqian Guo; Yunjie Lin; Tong Zhao

Journal ArticleOPEN ACCESS

A Small-Sample Text Classification Model Based on Pseudo-Label Fusion Clustering Algorithm

Applied Sciences (Switzerland) (2023) 13(8)

DOI: 10.3390/app13084716

5Citations

10Readers

Abstract

The problem of text classification has been a mainstream research branch in natural language processing, and how to improve the effect of classification under the scarcity of labeled samples is one of the hot issues in this direction. The current models supporting small-sample classification can learn knowledge and train models with a small number of labels, but the classification results are not satisfactory enough. In order to improve the classification accuracy, we propose a Small-sample Text Classification model based on the Pseudo-label fusion Clustering algorithm (STCPC). The algorithm includes two cores: (1) Mining the potential features of unlabeled data by using the training strategy of clustering assuming pseudo-labeling and then reducing the noise of the pseudo-labeled dataset by consistent training with its enhanced samples to improve the quality of the pseudo-labeled dataset. (2) The labeled data is augmented, and then the Easy Plug-in Data Augmentation (EPiDA) framework is used to balance the diversity and quality of the augmented samples to improve the richness of the labeled data reasonably. The results of comparison tests with other classical algorithms show that the STCPC model can effectively improve classification accuracy.

Author supplied keywords

Cite

CITATION STYLE

APA

Yang, L., Huang, B., Guo, S., Lin, Y., & Zhao, T. (2023). A Small-Sample Text Classification Model Based on Pseudo-Label Fusion Clustering Algorithm. Applied Sciences (Switzerland), 13(8). https://doi.org/10.3390/app13084716

A Small-Sample Text Classification Model Based on Pseudo-Label Fusion Clustering Algorithm

Abstract

Author supplied keywords

Cite

Register to see more suggestions