Conditional mutual information (CMI) maximization is a promising criterion for feature selection in a computationally efficient stepwise way, but it is hard to be applied comprehensively because of imprecise probability calculation and heavy computational load. Many dimension-reduced CMI-based and mutual information (MI)-based methods have been reported to achieve state-of-art performances in terms of classification. However, model deviations are introduced into the CMI and MI formulations in these methods during dimension reduction. In this paper, we start with the full-dimensional CMI to deal with the feature selection problem, so as to retain full inter-feature and feature-label mutual information when selecting new features. The cost function is approximated and simplified from a mathematical perspective to overcome the difficulties for maximizing the original full-dimensional CMI. A relationship is established between the proposed feature selection criterion and the one based on Hilbert-Schmidt independence, which explains qualitatively how the new criterion succeeds to achieve relevance maximization and redundance minimization simultaneously. Experiments on real-world datasets demonstrate the predominance of the proposed method over the existing ones.
CITATION STYLE
Sha, Z. C., Liu, Z. M., Ma, C., & Chen, J. (2021). Feature selection for multi-label classification by maximizing full-dimensional conditional mutual information. Applied Intelligence, 51(1), 326–340. https://doi.org/10.1007/s10489-020-01822-0
Mendeley helps you to discover research relevant for your work.