Efficient group communication for large-scale parallel clustering

1Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Global communication requirements and load imbalance of some parallel data mining algorithms are the major obstacles to exploit the computational power of large-scale systems. This work investigates how non-uniform data distributions can be exploited to remove the global communication requirement and to reduce the communication cost in iterative parallel data mining algorithms. In particular, the analysis focuses on one of the most influential and popular data mining methods, the k-means algorithm for cluster analysis. The straightforward parallel formulation of the k-means algorithm requires a global reduction operation at each iteration step, which hinders its scalability. This work studies a different parallel formulation of the algorithm where the requirement of global communication can be relaxed while still providing the exact solution of the centralised k-means algorithm. The proposed approach exploits a non-uniform data distribution which can be either found in real world distributed applications or can be induced by means of multi-dimensional binary search trees. The approach can also be extended to accommodate an approximation error which allows a further reduction of the communication costs.

References Powered by Scopus

Least Squares Quantization in PCM

11588Citations
N/AReaders
Get full text

Data clustering: 50 years beyond K-means

7316Citations
N/AReaders
Get full text

Multidimensional Binary Search Trees Used for Associative Searching

5539Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Dynamic group communication for large-scale parallel data mining

0Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Pettinger, D., & Di Fatta, G. (2013). Efficient group communication for large-scale parallel clustering. Studies in Computational Intelligence, 446, 155–164. https://doi.org/10.1007/978-3-642-32524-3_20

Readers' Seniority

Tooltip

Lecturer / Post doc 1

100%

Readers' Discipline

Tooltip

Computer Science 1

100%

Save time finding and organizing research with Mendeley

Sign up for free