Revisiting probability distribution assumptions for information theoretic feature selection

3Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.

Abstract

Feature selection has been shown to be beneficial for many data mining and machine learning tasks, especially for big data analytics. Mutual Information (MI) is a well-known information-theoretic approach used to evaluate the relevance of feature subsets and class labels. However, estimating high-dimensional MI poses significant challenges. Consequently, a great deal of research has focused on using low-order MI approximations or computing a lower bound on MI called Variational Information (VI). These methods often require certain assumptions made on the probability distributions of features such that these distributions are realistic yet tractable to compute. In this paper, we reveal two sets of distribution assumptions underlying many MI and VI based methods: Feature Independence Distribution and Geometric Mean Distribution. We systematically analyze their strengths and weaknesses and propose a logical extension called Arithmetic Mean Distribution, which leads to an unbiased and normalised estimation of probability densities. We conduct detailed empirical studies across a suite of 29 real-world classification problems and illustrate improved prediction accuracy of our methods based on the identification of more informative features, thus providing support for our theoretical findings.

Cite

CITATION STYLE

APA

Sun, Y., Wang, W., Kirley, M., Li, X., & Chan, J. (2020). Revisiting probability distribution assumptions for information theoretic feature selection. In AAAI 2020 - 34th AAAI Conference on Artificial Intelligence (pp. 5908–5915). AAAI press. https://doi.org/10.1609/aaai.v34i04.6050

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free