DPHKMS: An efficient hybrid clustering preserving differential privacy in spark

Zhi Qiang Gao; Long Jun Zhang

Book Chapter

DPHKMS: An efficient hybrid clustering preserving differential privacy in spark

Springer Science and Business Media Deutschland GmbH, (2018), 367-377

DOI: 10.1007/978-3-319-59463-7_37

11Citations

12Readers

Get full text

Abstract

k-means is one of the most widely used clustering algorithms by far. However, when faced with massive data clustering tasks, traditional data mining approaches, especially existing clustering mechanisms fail to deal with malicious attacks under arbitrary background knowledge. This could result in violation of individuals’ privacy, as well as leaks through system resources and clustering outputs while untrusted codes are directly performed on the original data. To address this issue, this paper proposes a novel, effective hybrid k-means clustering preserving differential privacy in Spark, namely Differential Privacy Hybrid k-means (DPHKMS). We combined Particle Swarm Optimization and Cuckoo-search to initiate better cluster centroid selections in the framework of big data computing platform, Apache Spark. Furthermore, DPHKMS is implemented and theoretically proved to meet ε-differential privacy with determinative privacy budget allocation under Laplace mechanism. Finally, experimental results on challenging benchmark data sets demonstrated that DPHKMS, guaranteeing availability and scalability, significantly improves existing varieties of k-means and consistently outperforms the state-of-the-art ones in terms of privacy-preserving, verifying the effectiveness and advantages of incorporating heuristic swarm intelligence.

Author supplied keywords

Cite

CITATION STYLE

APA

Gao, Z. Q., & Zhang, L. J. (2018). DPHKMS: An efficient hybrid clustering preserving differential privacy in spark. In Lecture Notes on Data Engineering and Communications Technologies (Vol. 6, pp. 367–377). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-319-59463-7_37

DPHKMS: An efficient hybrid clustering preserving differential privacy in spark

Abstract

Author supplied keywords

Cite

Register to see more suggestions