Rough set feature selection methods for case-based categorization of text documents

Kalyan Moy Gupta; Philip G. Moore; David W. Aha; Sankar K. Pal

Conference ProceedingsOPEN ACCESS

Rough set feature selection methods for case-based categorization of text documents

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2005) 3776 LNCS 792-798

DOI: 10.1007/11590316_128

14Citations

2Readers

Abstract

Textual case bases can contain thousands of features in the form of tokens or words, which can inhibit classification performance. Recent developments in rough set theory and its applications to feature selection offer promising approaches for selecting and reducing the number of features. We adapt two rough set feature selection methods for use on n-ary class text categorization problems. We also introduce a new method for selecting features that computes the union of features selected from randomly-partitioned training subsets. Our comparative evaluation of our method with a conventional method on the Reuters-21578 data set shows that it can dramatically decrease training time without compromising classification accuracy. Also, we found that randomized training set partitions dramatically reduce training time. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Gupta, K. M., Moore, P. G., Aha, D. W., & Pal, S. K. (2005). Rough set feature selection methods for case-based categorization of text documents. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3776 LNCS, pp. 792–798). https://doi.org/10.1007/11590316_128

Rough set feature selection methods for case-based categorization of text documents

Abstract

Cite

Register to see more suggestions