Provable de-anonymization of large datasets with sparse dimensions

Anupam Datta; Divya Sharma; Arunesh Sinha

Conference Proceedings

Provable de-anonymization of large datasets with sparse dimensions

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2012) 7215 LNCS 229-248

DOI: 10.1007/978-3-642-28641-4_13

23Citations

12Readers

Get full text

Abstract

There is a significant body of empirical work on statistical de-anonymization attacks against databases containing micro-data about individuals, e.g., their preferences, movie ratings, or transaction data. Our goal is to analytically explain why such attacks work. Specifically, we analyze a variant of the Narayanan-Shmatikov algorithm that was used to effectively de-anonymize the Netflix database of movie ratings. We prove theorems characterizing mathematical properties of the database and the auxiliary information available to the adversary that enable two classes of privacy attacks. In the first attack, the adversary successfully identifies the individual about whom she possesses auxiliary information (an isolation attack). In the second attack, the adversary learns additional information about the individual, although she may not be able to uniquely identify him (an information amplification attack). We demonstrate the applicability of the analytical results by empirically verifying that the mathematical properties assumed of the database are actually true for a significant fraction of the records in the Netflix movie ratings database, which contains ratings from about 500,000 users. © 2012 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

Datta, A., Sharma, D., & Sinha, A. (2012). Provable de-anonymization of large datasets with sparse dimensions. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7215 LNCS, pp. 229–248). https://doi.org/10.1007/978-3-642-28641-4_13

Provable de-anonymization of large datasets with sparse dimensions

Abstract

Author supplied keywords

Cite

Register to see more suggestions