Text categorization with class-based and corpus-based keyword selection

Arzucan Özgür; Levent Özgür; Tunga Güngör

Conference Proceedings

Text categorization with class-based and corpus-based keyword selection

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2005) 3733 LNCS 606-615

DOI: 10.1007/11569596_63

105Citations

123Readers

Get full text

Abstract

In this paper, we examine the use of keywords in text categorization with SVM. In contrast to the usual belief, we reveal that using keywords instead of all words yields better performance both in terms of accuracy and time. Unlike the previous studies that focus on keyword selection metrics, we compare the two approaches for keyword selection. In corpus-based approach, a single set of keywords is selected for all classes. In class-based approach, a distinct set of keywords is selected for each class. We perform the experiments with the standard Reuters-21578 dataset, with both boolean and tf-idf weighting. Our results show that although tf-idf weighting performs better, boolean weighting can be used where time and space resources are limited. Corpus-based approach with 2000 keywords performs the best. However, for small number of keywords, class-based approach outperforms the corpus-based approach with the same number of keywords. © Springer-Verlag Berlin Heidelberg 2005.

Author supplied keywords

Cite

CITATION STYLE

APA

Özgür, A., Özgür, L., & Güngör, T. (2005). Text categorization with class-based and corpus-based keyword selection. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3733 LNCS, pp. 606–615). Springer Verlag. https://doi.org/10.1007/11569596_63

Text categorization with class-based and corpus-based keyword selection

Abstract

Author supplied keywords

Cite

Register to see more suggestions