A general framework of feature selection for text categorization

4Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Many feature selection methods have been proposed for text categorization. However, their performances are usually verified by experiments, so the results rely on the corpora used and may not be accurate. This paper proposes a novel feature selection framework called Distribution-Based Feature Selection (DBFS) based on distribution difference of features. This framework generalizes most of the state-of-the-art feature selection methods including OCFS, MI, ECE, IG, CHI and OR. The performances of many feature selection methods can be estimated by theoretical analysis using components of this framework. Besides, DBFS sheds light on the merits and drawbacks of many existing feature selection methods. In addition, this framework helps to select suitable feature selection methods for specific domains. Moreover, a weighted model based on DBFS is given so that suitable feature selection methods for unbalanced datasets can be derived. The experimental results show that they are more effective than CHI, IG and OCFS on both balanced and unbalanced datasets. © 2009 Springer Berlin Heidelberg.

Cite

CITATION STYLE

APA

Jing, H., Wang, B., Yang, Y., & Xu, Y. (2009). A general framework of feature selection for text categorization. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5632 LNAI, pp. 647–662). https://doi.org/10.1007/978-3-642-03070-3_49

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free