Feature selection using improved mutual information for text classification

24Citations
Citations of this article
42Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

A major characteristic of text document classification problem is extremely high dimensionality of text data. In this paper we present two algorithms for feature (word) selection for the purpose of text classification. We used sequential forward selection methods based on improved mutual information introduced by Battiti [1] and Kwak and Choi [6] for non-textual data. These feature evaluation functions take into consideration how features work together. The performance of these evaluation functions compared to the information gain which evaluate features individually is discussed. We present experimental results using naive Bayes classifier based on multinomial model on the Reuters data set. Finally, we analyze the experimental results from various perspectives, including .Fi-measure, precision and recall. Preliminary experimental results indicate the effectiveness of the proposed feature selection algorithms in a text classification problem. © Springer-Verlag 2004.

Cite

CITATION STYLE

APA

Novoviĉová, J., Malík, A., & Pudil, P. (2004). Feature selection using improved mutual information for text classification. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3138, 1010–1017. https://doi.org/10.1007/978-3-540-27868-9_111

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free