Using typical testors for feature selection in text categorization

Aurora Pons-Porrata; Reynaldo Gil-García; Rafael Berlanga-Llavori

Conference ProceedingsOPEN ACCESS

Using typical testors for feature selection in text categorization

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2007) 4756 LNCS 643-652

DOI: 10.1007/978-3-540-76725-1_67

17Citations

6Readers

Abstract

A major difficulty of text categorization problems is the high dimensionality of the feature space. Thus, feature selection is often performed in order to increase both the efficiency and effectiveness of the classification. In this paper, we propose a feature selection method based on Testor Theory. This criterion takes into account inter-feature relationships. We experimentally compared our method with the widely used information gain using two well-known classification algorithms: k-nearest neighbour and Support Vector Machine. Two benchmark text collections were chosen as the testbeds: Reuters-21578 and Reuters Corpus Version 1 (RCV1-v2). We found that our method consistently outperformed information gain for both classifiers and both data collections, especially when aggressive feature selection is carried out. © Springer-Verlag Berlin Heidelberg 2007.

Author supplied keywords

Cite

CITATION STYLE

APA

Pons-Porrata, A., Gil-García, R., & Berlanga-Llavori, R. (2007). Using typical testors for feature selection in text categorization. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4756 LNCS, pp. 643–652). https://doi.org/10.1007/978-3-540-76725-1_67

Using typical testors for feature selection in text categorization

Abstract

Author supplied keywords

Cite

Register to see more suggestions