Ensemble learning based distributed clustering

Genlin Ji; Xiaohan Ling

Conference Proceedings

Ensemble learning based distributed clustering

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2007) 4819 LNAI 312-321

DOI: 10.1007/978-3-540-77018-3_32

18Citations

9Readers

Get full text

Abstract

Data mining techniques such as clustering are usually applied to centralized data sets. At present, more and more data is generated and stored in local sites, The transmission of the entire local data set to server is often unacceptable because of performance considerations, privacy and security aspects, and bandwidth constraints. In this paper, we propose a distributed clustering model based on ensemble learning, which could analyze and mine distributed data sources to find global clustering patterns. A typical scenario of the distributed clustering is a 'two-stage' course, i.e. firstly doing clustering in local sites and then in global site. The local clustering results transmitted to server site form an ensemble and combining schemes of ensemble learning use the ensemble to generate global clustering results. In the model, generating global patterns from ensemble is mathematically converted to be a combinatorial optimization problem. As an implementation for the model, a novel distributed clustering algorithm called DK-means is presented. Experimental results show that DK-means achieves similar results to K-means which clusters centralized data set at a time and is scalable to data distribution varying in local sites, and show validity of the model. © Springer-Verlag Berlin Heidelberg 2007.

Author supplied keywords

Cite

CITATION STYLE

APA

Ji, G., & Ling, X. (2007). Ensemble learning based distributed clustering. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4819 LNAI, pp. 312–321). Springer Verlag. https://doi.org/10.1007/978-3-540-77018-3_32

Ensemble learning based distributed clustering

Abstract

Author supplied keywords

Cite

Register to see more suggestions