Semi-supervised text categorization by considering sufficiency and diversity

Shoushan Li; Sophia Yat Mei Lee; Wei Gao; Chu Ren Huang

Conference Proceedings

Semi-supervised text categorization by considering sufficiency and diversity

Communications in Computer and Information Science (2013) 400 105-115

DOI: 10.1007/978-3-642-41644-6_11

3Citations

15Readers

Get full text

Abstract

In text categorization (TC), labeled data is often limited while unlabeled data is ample. This motivates semi-supervised learning for TC to improve the performance by exploring the knowledge in both labeled and unlabeled data. In this paper, we propose a novel bootstrapping approach to semi-supervised TC. First of all, we give two basic preferences, i.e., sufficiency and diversity for a possibly successful bootstrapping. After carefully considering the diversity preference, we modify the traditional bootstrapping algorithm by training the involved classifiers with random feature subspaces instead of the whole feature space. Moreover, we further improve the random feature subspace-based bootstrapping with some constraints on the subspace generation to better satisfy the diversity preference. Experimental evaluation shows the effectiveness of our modified bootstrapping approach in both topic and sentiment-based TC tasks. © Springer-Verlag Berlin Heidelberg 2013.

Author supplied keywords

Cite

CITATION STYLE

APA

Li, S., Lee, S. Y. M., Gao, W., & Huang, C. R. (2013). Semi-supervised text categorization by considering sufficiency and diversity. In Communications in Computer and Information Science (Vol. 400, pp. 105–115). Springer Verlag. https://doi.org/10.1007/978-3-642-41644-6_11

Semi-supervised text categorization by considering sufficiency and diversity

Abstract

Author supplied keywords

Cite

Register to see more suggestions