Feature selection using improved mutual information for text classification

Jana Novoviĉová; Antonín Malík; Pavel Pudil

Journal ArticleOPEN ACCESS

Feature selection using improved mutual information for text classification

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2004) 3138 1010-1017

DOI: 10.1007/978-3-540-27868-9_111

24Citations

42Readers

Abstract

A major characteristic of text document classification problem is extremely high dimensionality of text data. In this paper we present two algorithms for feature (word) selection for the purpose of text classification. We used sequential forward selection methods based on improved mutual information introduced by Battiti [1] and Kwak and Choi [6] for non-textual data. These feature evaluation functions take into consideration how features work together. The performance of these evaluation functions compared to the information gain which evaluate features individually is discussed. We present experimental results using naive Bayes classifier based on multinomial model on the Reuters data set. Finally, we analyze the experimental results from various perspectives, including .Fi-measure, precision and recall. Preliminary experimental results indicate the effectiveness of the proposed feature selection algorithms in a text classification problem. © Springer-Verlag 2004.

Cite

CITATION STYLE

APA

Novoviĉová, J., Malík, A., & Pudil, P. (2004). Feature selection using improved mutual information for text classification. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3138, 1010–1017. https://doi.org/10.1007/978-3-540-27868-9_111

Feature selection using improved mutual information for text classification

Abstract

Cite

Register to see more suggestions