Practical distributed privacy-preserving data analysis at large scale

Yitao Duan; John Canny

Book Chapter

Practical distributed privacy-preserving data analysis at large scale

Springer New York, (2013), 219-252

DOI: 10.1007/978-1-4614-9242-9_8

7Citations

8Readers

Get full text

Abstract

In this chapter we investigate practical technologies for security security and privacy privacy in data analysis at large scale. privacy-preserving data analysis We motivate our approach by discussing the challenges and opportunities in light of current and emerging analysis paradigms on large data sets. In particular, we present a framework for privacy-preserving distributed data analysis that is practical for many real-world applications. The framework is called Peers for Privacy (P4P) peers for privacy and features a novel heterogeneous architecture and a number of efficient tools for performing private computation and offering security at large scale. It maintains three key properties, which are essential for real-world applications: (i) provably strong privacy; (ii) adequate efficiency at reasonably large scale; and (iii) robustness against realistic adversaries. The framework gains its practicality by decomposing data mining algorithms into a sequence of vector addition steps, which can be privately evaluated using efficient cryptographic tools, namely verifiable secret sharing over small field (e.g., 32 or 64 bits), which have the same cost as regular, non-private arithmetic. This paradigm supports a large number of statistical learning algorithms, including SVD, PCA, principal component analysis k-means, K-means ID3 and machine learning algorithms based on Expectation-Maximization, expectation maximization as well as all algorithms in the statistical query model (Kearns, Efficient noise-tolerant learning from statistical queries. In: STOC'93, San Diego, pp. 392-401, 1993). As a concrete example, we show how singular value decomposition, which is an extremely useful algorithm and the core of many data mining tasks, can be performed efficiently with privacy in P4P. Using real data, we demonstrate that P4P is orders of magnitude faster than other solutions.

Cite

CITATION STYLE

APA

Duan, Y., & Canny, J. (2013). Practical distributed privacy-preserving data analysis at large scale. In Large-Scale Data Analytics (Vol. 9781461492429, pp. 219–252). Springer New York. https://doi.org/10.1007/978-1-4614-9242-9_8

Practical distributed privacy-preserving data analysis at large scale

Abstract

Cite

Register to see more suggestions