Hierarchical model-based clustering for large datasets

  • Posse C
  • 12


    Mendeley users who have this article in their library.
  • 26


    Citations of this article.


In recent years, hierarchical model-based clustering has provided promising results in a variety of applications. However, its use with large datasets has been hindered by a time and memory complexity that are at least quadratic in the number of observations. To overcome this difficulty, this article proposes to start the hierarchical agglomeration from an efficient classification of the data in many classes rather than from the usual set of singleton clusters. This initial partition is derived from a subgraph of the minimum spanning tree associated with the data. To this end, we develop graphical tools that assess the presence of clusters in the data and uncover observations difficult to classify. We use this approach to analyze two large, real datasets: a multiband MRI image of the human brain and data on global precipitation climatology. We use the real datasets to discuss ways of integrating the spatial information in the clustering analysis. We focus on two-stage methods, in which a second stage of processing using established methods is applied to the output from the algorithm presented in this article, viewed as a first stage. CR - Copyright © 2001 American Statistical Association, Institute of Mathematical Statistics and Interface Foundation of America

Author-supplied keywords

  • Discrete ICM algorithm
  • Gaussian mixture
  • Hierarchical clustering
  • Image classification
  • Minimum spanning tree
  • Sequential uniform plot

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in

Find this document


  • Christian Posse

Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free