Privacy has become a serious concern in data mining. Achieving adequate privacy is especially challenging when the scale of the problem is large. Fundamentally, designing a practical privacy-preserving data mining system involves tradeoffs among several factors such as the privacy guarantee, the accuracy or utility of the mining result, the computation efficiency and the generality of the approach. In this paper, we present PEM, a practical system that tries to strike the right balance among these factors. We use a combination of noise-based and noise-free techniques to achieve provable differential privacy at a low computational overhead while obtaining more accurate result than previous approaches. PEM provides an efficient private gradient descent that can be the basis for many practical data mining and machine learning algorithms, like logistic regression, k-means, and Apriori. We evaluate these algorithms on three real-world open datasets in a cloud computing environment. The results show that PEM achieves good accuracy, high scalability, low computation cost while maintaining differential privacy.
CITATION STYLE
Li, Y., Duan, Y., & Xu, W. (2017). PEM: A Practical Differentially Private System for Large-Scale Cross-Institutional Data Mining. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10535 LNAI, pp. 89–105). Springer Verlag. https://doi.org/10.1007/978-3-319-71246-8_6
Mendeley helps you to discover research relevant for your work.