Reconsidering mutual information based feature selection: A statistical significance view

Nguyen Xuan Vinh; Jeffrey Chan; James Bailey

Conference ProceedingsOPEN ACCESS

Reconsidering mutual information based feature selection: A statistical significance view

Proceedings of the National Conference on Artificial Intelligence (2014) 3 2092-2098

DOI: 10.1609/aaai.v28i1.8953

17Citations

22Readers

Abstract

Mutual information (MI) based approaches are a popular feature selection paradigm. Although the stated goal of Mi-based feature selection is to identify a subset of features that share the highest mutual information with the class variable, most current Mi-based techniques are greedy methods that make use of low dimensional MI quantities. The reason for using low dimensional approximation has been mostly attributed to the difficulty associated with estimating the high dimensional MI from limited samples. In this paper, we argue a different viewpoint that, given a very large amount of data, the high dimensional MI objective is still problematic to be employed as a meaningful optimization criterion, due to its overfitting nature: the MI almost always increases as more features are added, thus leading to a trivial solution which includes all features. We propose a novel approach to the MI-based feature selection problem, in which the overfitting phenomenon is controlled rigourously by means of a statistical test. We develop local and global optimization algorithms for this new feature selection model, and demonstrate its effectiveness in the applications of explaining variables and objects.

Cite

CITATION STYLE

APA

Vinh, N. X., Chan, J., & Bailey, J. (2014). Reconsidering mutual information based feature selection: A statistical significance view. In Proceedings of the National Conference on Artificial Intelligence (Vol. 3, pp. 2092–2098). AI Access Foundation. https://doi.org/10.1609/aaai.v28i1.8953

Reconsidering mutual information based feature selection: A statistical significance view

Abstract

Cite

Register to see more suggestions