A Small-Sample Text Classification Model Based on Pseudo-Label Fusion Clustering Algorithm

5Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.

Abstract

The problem of text classification has been a mainstream research branch in natural language processing, and how to improve the effect of classification under the scarcity of labeled samples is one of the hot issues in this direction. The current models supporting small-sample classification can learn knowledge and train models with a small number of labels, but the classification results are not satisfactory enough. In order to improve the classification accuracy, we propose a Small-sample Text Classification model based on the Pseudo-label fusion Clustering algorithm (STCPC). The algorithm includes two cores: (1) Mining the potential features of unlabeled data by using the training strategy of clustering assuming pseudo-labeling and then reducing the noise of the pseudo-labeled dataset by consistent training with its enhanced samples to improve the quality of the pseudo-labeled dataset. (2) The labeled data is augmented, and then the Easy Plug-in Data Augmentation (EPiDA) framework is used to balance the diversity and quality of the augmented samples to improve the richness of the labeled data reasonably. The results of comparison tests with other classical algorithms show that the STCPC model can effectively improve classification accuracy.

Cite

CITATION STYLE

APA

Yang, L., Huang, B., Guo, S., Lin, Y., & Zhao, T. (2023). A Small-Sample Text Classification Model Based on Pseudo-Label Fusion Clustering Algorithm. Applied Sciences (Switzerland), 13(8). https://doi.org/10.3390/app13084716

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free