A Framework of Feature Selection Methods for Text Categorization

Shoushan Li; Rui Xia; Chengqing Zong; Chu Ren Huang

Conference Proceedings

A Framework of Feature Selection Methods for Text Categorization

ACL-IJCNLP 2009 - Joint Conf. of the 47th Annual Meeting of the Association for Computational Linguistics and 4th Int. Joint Conf. on Natural Language Processing of the AFNLP, Proceedings of the Conf. (2009) 692-700

DOI: 10.3115/1690219.1690243

89Citations

225Readers

Get full text

Abstract

In text categorization, feature selection (FS) is a strategy that aims at making text classifiers more efficient and accurate. However, when dealing with a new task, it is still difficult to quickly select a suitable one from various FS methods provided by many previous studies. In this paper, we propose a theoretic framework of FS methods based on two basic measurements: frequency measurement and ratio measurement. Then six popular FS methods are in detail discussed under this framework. Moreover, with the guidance of our theoretical analysis, we propose a novel method called weighed frequency and odds (WFO) that combines the two measurements with trained weights. The experimental results on data sets from both topic-based and sentiment classification tasks show that this new method is robust across different tasks and numbers of selected features.

Cite

CITATION STYLE

APA

Li, S., Xia, R., Zong, C., & Huang, C. R. (2009). A Framework of Feature Selection Methods for Text Categorization. In ACL-IJCNLP 2009 - Joint Conf. of the 47th Annual Meeting of the Association for Computational Linguistics and 4th Int. Joint Conf. on Natural Language Processing of the AFNLP, Proceedings of the Conf. (pp. 692–700). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1690219.1690243

A Framework of Feature Selection Methods for Text Categorization

Abstract

Cite

Register to see more suggestions