Ensemble feature selection is one of the ensemble learning methods, where each classifier is trained or built by feature selection result. Ensemble feature selection is an effective way for dealing with high dimension and small sample data, such as microarray data. However, ensemble feature selection should achieve more accurate and stable classification performance. In this paper, we present a novel diversity measure based on information theory called Sum of Minimal Information Distance (SMID), which maximizes the relevance between feature subsets and class label as well as the diversity between feature subsets. Moreover, a novel ensemble feature selection framework satisfying this criterion is proposed. In this framework, features that have more mutual information with class label and more diversity between each other are retained. Different feature subsets are used to train base classifiers after being obtained by incremental search method, and then these classifiers are aggregated into a consensus classifier by majority voting. Comparing with three representative feature selection methods and five ensemble learning methods on ten microarray datasets, the experiment results show that the proposed method achieves better performance than the other methods in terms of the classification accuracy.
CITATION STYLE
Cai, J., Luo, J., Liang, C., & Yang, S. (2017). A novel information theory-based ensemble feature selection framework for high-dimensional microarray data. International Journal of Performability Engineering, 13(5), 742–753. https://doi.org/10.23940/ijpe.17.05.p17.742753
Mendeley helps you to discover research relevant for your work.