Learning from labeled and unlabeled documents: A comparative study on semi-supervised text classification

Carsten Lanquillon

Conference ProceedingsOPEN ACCESS

Learning from labeled and unlabeled documents: A comparative study on semi-supervised text classification

Lanquillon C

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2000) 1910 490-497

DOI: 10.1007/3-540-45372-5_56

7Citations

13Readers

Abstract

Supervised learning algorithms usually require large amounts of training data to learn reasonably accurate classifiers. Yet, for many text classification tasks, providing labeled training documents is expensive, while unlabeled documents are readily available in large quantities. Learning from both, labeled and unlabeled documents, in a semi- supervised framework is a promising approach to reduce the need for labeled training documents. This paper compares three commonly applied text classifiers in the light of semi-supervised learning, namely a linear support vector machine, a similarity-based tfidf and a Naïve Bayes classifier. Results on a real-world text datasets show that these learners may substantially benefit from using a large amount of unlabeled documents in addition to some labeled documents.

Cite

CITATION STYLE

APA

Lanquillon, C. (2000). Learning from labeled and unlabeled documents: A comparative study on semi-supervised text classification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 1910, pp. 490–497). Springer Verlag. https://doi.org/10.1007/3-540-45372-5_56

Learning from labeled and unlabeled documents: A comparative study on semi-supervised text classification

Abstract

Cite

Register to see more suggestions