Practical distributed privacy-preserving data analysis at large scale

7Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this chapter we investigate practical technologies for security security and privacy privacy in data analysis at large scale. privacy-preserving data analysis We motivate our approach by discussing the challenges and opportunities in light of current and emerging analysis paradigms on large data sets. In particular, we present a framework for privacy-preserving distributed data analysis that is practical for many real-world applications. The framework is called Peers for Privacy (P4P) peers for privacy and features a novel heterogeneous architecture and a number of efficient tools for performing private computation and offering security at large scale. It maintains three key properties, which are essential for real-world applications: (i) provably strong privacy; (ii) adequate efficiency at reasonably large scale; and (iii) robustness against realistic adversaries. The framework gains its practicality by decomposing data mining algorithms into a sequence of vector addition steps, which can be privately evaluated using efficient cryptographic tools, namely verifiable secret sharing over small field (e.g., 32 or 64 bits), which have the same cost as regular, non-private arithmetic. This paradigm supports a large number of statistical learning algorithms, including SVD, PCA, principal component analysis k-means, K-means ID3 and machine learning algorithms based on Expectation-Maximization, expectation maximization as well as all algorithms in the statistical query model (Kearns, Efficient noise-tolerant learning from statistical queries. In: STOC'93, San Diego, pp. 392-401, 1993). As a concrete example, we show how singular value decomposition, which is an extremely useful algorithm and the core of many data mining tasks, can be performed efficiently with privacy in P4P. Using real data, we demonstrate that P4P is orders of magnitude faster than other solutions.

Cite

CITATION STYLE

APA

Duan, Y., & Canny, J. (2013). Practical distributed privacy-preserving data analysis at large scale. In Large-Scale Data Analytics (Vol. 9781461492429, pp. 219–252). Springer New York. https://doi.org/10.1007/978-1-4614-9242-9_8

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free