Data mining techniques such as clustering are usually applied to centralized data sets. At present, more and more data is generated and stored in local sites, The transmission of the entire local data set to server is often unacceptable because of performance considerations, privacy and security aspects, and bandwidth constraints. In this paper, we propose a distributed clustering model based on ensemble learning, which could analyze and mine distributed data sources to find global clustering patterns. A typical scenario of the distributed clustering is a 'two-stage' course, i.e. firstly doing clustering in local sites and then in global site. The local clustering results transmitted to server site form an ensemble and combining schemes of ensemble learning use the ensemble to generate global clustering results. In the model, generating global patterns from ensemble is mathematically converted to be a combinatorial optimization problem. As an implementation for the model, a novel distributed clustering algorithm called DK-means is presented. Experimental results show that DK-means achieves similar results to K-means which clusters centralized data set at a time and is scalable to data distribution varying in local sites, and show validity of the model. © Springer-Verlag Berlin Heidelberg 2007.
CITATION STYLE
Ji, G., & Ling, X. (2007). Ensemble learning based distributed clustering. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4819 LNAI, pp. 312–321). Springer Verlag. https://doi.org/10.1007/978-3-540-77018-3_32
Mendeley helps you to discover research relevant for your work.