Using typical testors for feature selection in text categorization

17Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

A major difficulty of text categorization problems is the high dimensionality of the feature space. Thus, feature selection is often performed in order to increase both the efficiency and effectiveness of the classification. In this paper, we propose a feature selection method based on Testor Theory. This criterion takes into account inter-feature relationships. We experimentally compared our method with the widely used information gain using two well-known classification algorithms: k-nearest neighbour and Support Vector Machine. Two benchmark text collections were chosen as the testbeds: Reuters-21578 and Reuters Corpus Version 1 (RCV1-v2). We found that our method consistently outperformed information gain for both classifiers and both data collections, especially when aggressive feature selection is carried out. © Springer-Verlag Berlin Heidelberg 2007.

Cite

CITATION STYLE

APA

Pons-Porrata, A., Gil-García, R., & Berlanga-Llavori, R. (2007). Using typical testors for feature selection in text categorization. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4756 LNCS, pp. 643–652). https://doi.org/10.1007/978-3-540-76725-1_67

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free