Re-identification Attack to Privacy-Preserving Data Analysis with Noisy Sample-Mean

Du Su; Hieu Tri Huynh; Ziao Chen; Yi Lu; Wenmiao Lu

Conference ProceedingsOPEN ACCESS

Re-identification Attack to Privacy-Preserving Data Analysis with Noisy Sample-Mean

Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2020) 1045-1053

DOI: 10.1145/3394486.3403148

10Citations

29Readers

Get full text

Abstract

In mining sensitive databases, access to sensitive class attributes of individual records is often prohibited by enforcing field-level security, while only aggregate class-specific statistics are allowed to be released. We consider a common privacy-preserving data analytics scenario where only a noisy sample mean of the class of interest can be queried. Such practice is widely found in medical research and business analytics settings. This paper studies the hazard of re-identification of entire class caused by revealing a noisy sample mean of the class. With a novel formulation of the re-identification attack as a generalized positive-unlabeled learning problem, we prove that the risk function of the re-identification problem is closely related to that of learning with complete data. We demonstrate that with a one-sided noisy sample mean, an effective re-identification attack can be devised with existing PU learning algorithms. We then propose a novel algorithm, growPU, that exploits the unique property of sample mean and consistently outperforms existing PU learning algorithms on the re-identification task. GrowPU achieves re-identification accuracy of 93.6% on the MNIST dataset and 88.1% on an online behavioral dataset with noiseless sample mean. With noise that guarantees 0.01-differential privacy, growPU achieves 91.9% on the MNIST dataset and 84.6% on the online behavioral dataset.

Author supplied keywords

Cite

CITATION STYLE

APA

Su, D., Huynh, H. T., Chen, Z., Lu, Y., & Lu, W. (2020). Re-identification Attack to Privacy-Preserving Data Analysis with Noisy Sample-Mean. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1045–1053). Association for Computing Machinery. https://doi.org/10.1145/3394486.3403148

Re-identification Attack to Privacy-Preserving Data Analysis with Noisy Sample-Mean

Abstract

Author supplied keywords

Cite

Register to see more suggestions