Estimating the number of data clusters via the gap statistic

  • Tibshirani R
  • Walther G
  • Hastie T
N/ACitations
Citations of this article
226Readers
Mendeley users who have this article in their library.

Abstract

We propose a method (the "Gap statistic") for estimating the number of clusters (groups) in a set of data. The technique uses the output of any clustering algorithm (e.g. k-means or hierarchical), comparing the change in within cluster dispersion to that expected under a uniform null distribution. Some theory is developed for the proposal and a simulation study that shows that the Gap statistic usually outperformes other methods that have been proposed in the literature.

Cite

CITATION STYLE

APA

Tibshirani, R., Walther, G., & Hastie, T. (2001). Estimating the number of data clusters via the gap statistic. Journal of the Royal Statistical Society: Series B, 63(Part 2), 411–423.

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free