A comparative study of several cluster number selection criteria

Xuelei Hu; Lei Xu

Journal Article

A comparative study of several cluster number selection criteria

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2004) 2690 195-202

DOI: 10.1007/978-3-540-45080-1_27

32Citations

5Readers

Get full text

Abstract

The selection of the number of clusters is an important and challenging issue in cluster analysis. In this paper we perform an experimental comparison of several criteria for determining the number of clusters based on Gaussian mixture model. The criteria that we consider include Akaike's information criterion (AIC), the consistent Akaike's information criterion (CAIC), the minimum description length (MDL) criterion which formally coincides with the Bayesian inference criterion (BIG), and two model selection methods driven from Bayesian Ying-Yang (BYY) harmony learning: harmony empirical learning criterion (BYY-HEC) and harmony data smoothing criterion (BYY-HDS). We investigate these methods on synthetic data sets of different sample size and the iris data set. The results of experiments illustrate that BYY-HDS has the best overall success rate and obviously outperforms other methods for small sample size. CAIC and MDL tend to underestimate the number of clusters, while AIC and BYY-HEC tend to overestimate the number of clusters especially in the case of small sample size. © Springer-Verlag 2003.

Cite

CITATION STYLE

APA

Hu, X., & Xu, L. (2004). A comparative study of several cluster number selection criteria. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2690, 195–202. https://doi.org/10.1007/978-3-540-45080-1_27

A comparative study of several cluster number selection criteria

Abstract

Cite

Register to see more suggestions