On the behaviour of permutation-based variable importance measures in random forest clustering

Stefano Nembrini

Journal ArticleOPEN ACCESS

On the behaviour of permutation-based variable importance measures in random forest clustering

Nembrini S

Journal of Chemometrics (2019) 33(8)

DOI: 10.1002/cem.3135

7Citations

14Readers

Get full text

Abstract

Unsupervised random forest (RF) is a popular clustering method that can be implemented by artificially creating a two-class problem. Variable importance measures (VIMs) can be used to determine which variables are relevant for defining the RF dissimilarity, but they have not received as much attention as the supervised case. Here, I show that sampling schemes used in generating the artificial data—including the original one—can influence the behaviour of the permutation importance in a way that can affect conclusions on variable relevance and also propose a solution. Generating the artificial data using a Bayesian bootstrap keeps the desirable properties of the permutation VIM.

Author supplied keywords

Cite

CITATION STYLE

APA

Nembrini, S. (2019). On the behaviour of permutation-based variable importance measures in random forest clustering. Journal of Chemometrics, 33(8). https://doi.org/10.1002/cem.3135

On the behaviour of permutation-based variable importance measures in random forest clustering

Abstract

Author supplied keywords

Cite

Register to see more suggestions