A review of semi-supervised learning for text classification

99Citations
Citations of this article
200Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

A huge amount of data is generated daily leading to big data challenges. One of them is related to text mining, especially text classification. To perform this task we usually need a large set of labeled data that can be expensive, time-consuming, or difficult to be obtained. Considering this scenario semi-supervised learning (SSL), the branch of machine learning concerned with using labeled and unlabeled data has expanded in volume and scope. Since no recent survey exists to overview how SSL has been used in text classification, we aim to fill this gap and present an up-to-date review of SSL for text classification. We retrieve 1794 works from the last 5 years from IEEE Xplore, ACM Digital Library, Science Direct, and Springer. Then, 157 articles were selected to be included in this review. We present the application domain, datasets, and languages employed in the works. The text representations and machine learning algorithms. We also summarize and organize the works following a recent taxonomy of SSL. We analyze the percentage of labeled data used, the evaluation metrics, and obtained results. Lastly, we present some limitations and future trends in the area. We aim to provide researchers and practitioners with an outline of the area as well as useful information for their current research.

Cite

CITATION STYLE

APA

Duarte, J. M., & Berton, L. (2023). A review of semi-supervised learning for text classification. Artificial Intelligence Review, 56(9), 9401–9469. https://doi.org/10.1007/s10462-023-10393-8

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free