Abstract
In traditional text classification approaches, the semantic meanings of the classes are described by the labeled documents. Since labeling documents is often time consuming and expensive, it is a promising idea that asking users to provide some keywords to depict the classes, instead of labeling any documents. However, short pieces of keywords may not contain enough information and therefore may lead to unreliable classifier. Fortunately, there are large amount of public data easily available in web directories, such as ODP, Wikipedia, etc. We are interested in exploring the enormous crowd intelligence contained in such public data to enhance text classification. In this paper, we propose a novel text classification framework called "Knowledge Supervised Learning"(KSL), which utilizes the knowledge in keywords and the crowd intelligence to learn the classifier without any labeled documents. We design a two-stage risk minimization (TSRM) approach for the KSL problem. It can optimize the expected prediction risk and build the high quality classifier. Empirical results verify our claim: our algorithm can achieve above 0.9 on Micro-F1 on average, which is much better than baselines and even comparable against SVM classifier supervised by labeled documents. © 2008 Springer Berlin Heidelberg.
Cite
CITATION STYLE
Zhang, C., Xue, G. R., & Yu, Y. (2008). Knowledge supervised text classification with no labeled documents. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5351 LNAI, pp. 509–520). https://doi.org/10.1007/978-3-540-89197-0_47
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.