Abstract
We propose a method (the "Gap statistic") for estimating the number of clusters (groups) in a set of data. The technique uses the output of any clustering algorithm (e.g. k-means or hierarchical), comparing the change in within cluster dispersion to that expected under a uniform null distribution. Some theory is developed for the proposal and a simulation study that shows that the Gap statistic usually outperformes other methods that have been proposed in the literature.
Cite
CITATION STYLE
Tibshirani, R., Walther, G., & Hastie, T. (2001). Estimating the number of data clusters via the gap statistic. Journal of the Royal Statistical Society: Series B, 63(Part 2), 411–423.
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.