A major difficulty of text categorization problems is the high dimensionality of the feature space. Thus, feature selection is often performed in order to increase both the efficiency and effectiveness of the classification. In this paper, we propose a feature selection method based on Testor Theory. This criterion takes into account inter-feature relationships. We experimentally compared our method with the widely used information gain using two well-known classification algorithms: k-nearest neighbour and Support Vector Machine. Two benchmark text collections were chosen as the testbeds: Reuters-21578 and Reuters Corpus Version 1 (RCV1-v2). We found that our method consistently outperformed information gain for both classifiers and both data collections, especially when aggressive feature selection is carried out. © Springer-Verlag Berlin Heidelberg 2007.
CITATION STYLE
Pons-Porrata, A., Gil-García, R., & Berlanga-Llavori, R. (2007). Using typical testors for feature selection in text categorization. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4756 LNCS, pp. 643–652). https://doi.org/10.1007/978-3-540-76725-1_67
Mendeley helps you to discover research relevant for your work.