Genetic algorithms for subset selection in model-based clustering

Luca Scrucca

Book Chapter

Genetic algorithms for subset selection in model-based clustering

Scrucca L

Springer International Publishing, (2016), 55-70

DOI: 10.1007/978-3-319-24211-8_3

23Citations

17Readers

Get full text

Abstract

Model-based clustering assumes that the data observed can be represented by a finite mixture model, where each cluster is represented by a parametric distribution. The Gaussian distribution is often employed in the multivariate continuous case. The identification of the subset of relevant clustering variables enables a parsimonious number of unknown parameters to be achieved, thus yielding a more efficient estimate, a clearer interpretation and often improved clustering partitions. This paper discusses variable or feature selection for model-based clustering. Following the approach of Raftery and Dean (J Am Stat Assoc 101(473):168–178, 2006), the problem of subset selection is recast as a model comparison problem, and BIC is used to approximate Bayes factors. The criterion proposed is based on the BIC difference between a candidate clustering model for the given subset and a model which assumes no clustering for the same subset. Thus, the problem amounts to finding the feature subset which maximises such a criterion. A search over the potentially vast solution space is performed using genetic algorithms, which are stochastic search algorithms that use techniques and concepts inspired by evolutionary biology and natural selection. Numerical experiments using real data applications are presented and discussed.

Cite

CITATION STYLE

APA

Scrucca, L. (2016). Genetic algorithms for subset selection in model-based clustering. In Unsupervised Learning Algorithms (pp. 55–70). Springer International Publishing. https://doi.org/10.1007/978-3-319-24211-8_3

Genetic algorithms for subset selection in model-based clustering

Abstract

Cite

Register to see more suggestions