Ensemble learning based distributed clustering

18Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Data mining techniques such as clustering are usually applied to centralized data sets. At present, more and more data is generated and stored in local sites, The transmission of the entire local data set to server is often unacceptable because of performance considerations, privacy and security aspects, and bandwidth constraints. In this paper, we propose a distributed clustering model based on ensemble learning, which could analyze and mine distributed data sources to find global clustering patterns. A typical scenario of the distributed clustering is a 'two-stage' course, i.e. firstly doing clustering in local sites and then in global site. The local clustering results transmitted to server site form an ensemble and combining schemes of ensemble learning use the ensemble to generate global clustering results. In the model, generating global patterns from ensemble is mathematically converted to be a combinatorial optimization problem. As an implementation for the model, a novel distributed clustering algorithm called DK-means is presented. Experimental results show that DK-means achieves similar results to K-means which clusters centralized data set at a time and is scalable to data distribution varying in local sites, and show validity of the model. © Springer-Verlag Berlin Heidelberg 2007.

Cite

CITATION STYLE

APA

Ji, G., & Ling, X. (2007). Ensemble learning based distributed clustering. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4819 LNAI, pp. 312–321). Springer Verlag. https://doi.org/10.1007/978-3-540-77018-3_32

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free