Efficient parallel hierarchical clustering

Manoranjan Dash; Simona Petrutiu; Peter Scheuermann

Journal ArticleOPEN ACCESS

Efficient parallel hierarchical clustering

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2004) 3149 363-371

DOI: 10.1007/978-3-540-27866-5_47

19Citations

19Readers

Abstract

Hierarchical agglomerative clustering (HAG) is a common clustering method that outputs a dendrogram showing all N levels of agglomerations where N is the number of objects in the data set. High time and memory complexities are some of the major bottlenecks in its application to real-world problems. In the literature parallel algorithms are proposed to overcome these limitations. But, as this paper shows, existing parallel HAG algorithms are inefficient due to ineffective partitioning of the data. We first show how HAG follows a rule where most agglomerations have very small dissimilarity and only a small portion towards the end have large dissimilarity. Partially overlapping partitioning (POP) exploits this principle and obtains efficient yet accurate HAG algorithms. The total number of dissimilarities is reduced by a factor close to the number of cells in the partition. We present pPOP, the parallel version of POP, that is implemented on a shared memory multiprocessor architecture. Extensive theoretical analysis and experimental results are presented and show that pPOP gives close to linear speedup and outperforms the existing parallel algorithms significantly both in CPU time and memory requirements. hierarchical agglomerative clustering, partitioning, parallel algorithm, shared memory architecture. © Springer-Verlag 2004.

Cite

CITATION STYLE

APA

Dash, M., Petrutiu, S., & Scheuermann, P. (2004). Efficient parallel hierarchical clustering. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3149, 363–371. https://doi.org/10.1007/978-3-540-27866-5_47

Efficient parallel hierarchical clustering

Abstract

Cite

Register to see more suggestions