A DP canopy K-Means algorithm for privacy preservation of hadoop platform

Tao Shang; Zheng Zhao; Zhenyu Guan; Jianwei Liu

Conference Proceedings

A DP canopy K-Means algorithm for privacy preservation of hadoop platform

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2017) 10581 LNCS 189-198

DOI: 10.1007/978-3-319-69471-9_14

8Citations

4Readers

Get full text

Abstract

K-means algorithm for data mining is combined with differential privacy preservation. Although it improves the security of data information, the selection of clustering number and initial center point is still blind and random. In this paper, we integrate an optimized Canopy algorithm with DP K-means algorithm, and apply it to Hadoop platform. Firstly, we optimize the Canopy algorithm according to the minimum and maximum principle and use the functions of the MapReduce framework to implement it. Secondly, we utilize the number and the set of center points obtained to implement the DP K-means algorithm on MapReduce. As a result, the improved Canopy algorithm can optimize the selection of the number of centers and clusters on Hadoop platform, so the proposed K-means algorithm can improve security, usability and efficiency of calculation.

Author supplied keywords

Cite

CITATION STYLE

APA

Shang, T., Zhao, Z., Guan, Z., & Liu, J. (2017). A DP canopy K-Means algorithm for privacy preservation of hadoop platform. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10581 LNCS, pp. 189–198). Springer Verlag. https://doi.org/10.1007/978-3-319-69471-9_14

A DP canopy K-Means algorithm for privacy preservation of hadoop platform

Abstract

Author supplied keywords

Cite

Register to see more suggestions