Exploring the impact of purity gap gain on the efficiency and effectiveness of random forest feature selection

2Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The Random Forest (RF) classifier has the capacity to facilitate both wrapper and embedded feature selection through the Mean Decrease Accuracy (MDA) and Mean Decrease Impurity (MDI) methods, respectively. MDI is known to be biased towards predictor variables with multiple values whilst MDA is stable in this regard. As such, MDA is the predominantly preferred option for RF-based feature selection, despite its higher computational overhead in comparison to MDI. This research seeks to simultaneously reduce the computational overhead and improve the effectiveness of RF feature selection. We propose two improvements to the MDI method to overcome its shortcomings. The first is using our proposed Purity Gap Gain (PGG) measure which has an emphasis on computational efficiency, as an alternative to the Gini Importance (GI) metric. The second is incorporating a Relative Mean Decrease Impurity (RMDI) score, which aims to offset the bias towards multi-valued predictor variables through random feature value permutations. Experiments are conducted on UCI datasets to establish the impact of PGG and RMDI on RF performance.

Cite

CITATION STYLE

APA

Gwetu, M. V., Tapamo, J. R., & Viriri, S. (2019). Exploring the impact of purity gap gain on the efficiency and effectiveness of random forest feature selection. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11683 LNAI, pp. 340–352). Springer Verlag. https://doi.org/10.1007/978-3-030-28377-3_28

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free