Estimating the number of data clusters via the gap statistic

Robert Tibshirani; Guenther Walther; Trevor Hastie

Journal Article

Estimating the number of data clusters via the gap statistic

Tibshirani R
Walther G
Hastie T

Journal of the Royal Statistical Society: Series B (2001) 63(Part 2) 411-423

N/ACitations

227Readers

Abstract

We propose a method (the "Gap statistic") for estimating the number of clusters (groups) in a set of data. The technique uses the output of any clustering algorithm (e.g. k-means or hierarchical), comparing the change in within cluster dispersion to that expected under a uniform null distribution. Some theory is developed for the proposal and a simulation study that shows that the Gap statistic usually outperformes other methods that have been proposed in the literature.

Cite

CITATION STYLE

APA

Tibshirani, R., Walther, G., & Hastie, T. (2001). Estimating the number of data clusters via the gap statistic. Journal of the Royal Statistical Society: Series B, 63(Part 2), 411–423.

Estimating the number of data clusters via the gap statistic

Abstract

Cite

Register to see more suggestions