Measuring the stability of feature selection

Sarah Nogueira; Gavin Brown

Conference ProceedingsOPEN ACCESS

Measuring the stability of feature selection

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2016) 9852 LNAI 442-457

DOI: 10.1007/978-3-319-46227-1_28

65Citations

48Readers

Abstract

In feature selection algorithms, “stability” is the sensitivity of the chosen feature set to variations in the supplied training data. As such it can be seen as an analogous concept to the statistical variance of a predictor. However unlike variance, there is no unique definition of stability, with numerous proposed measures over 15 years of literature. In this paper, instead of defining a new measure, we start from an axiomatic point of view and identify what properties would be desirable. Somewhat surprisingly, we find that the simple Pearson’s correlation coefficient has all necessary properties, yet has somehow been overlooked in favour of more complex alternatives. Finally, we illustrate how the use of this measure in practice can provide better interpretability and more confidence in the model selection process. The data and software related to this paper are available at https://github.com/nogueirs/ECML2016.

Author supplied keywords

Cite

CITATION STYLE

APA

Nogueira, S., & Brown, G. (2016). Measuring the stability of feature selection. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9852 LNAI, pp. 442–457). Springer Verlag. https://doi.org/10.1007/978-3-319-46227-1_28

Measuring the stability of feature selection

Abstract

Author supplied keywords

Cite

Register to see more suggestions