A Baseline for Attribute Disclosure Risk in Synthetic Data

Markus Hittmeir; Rudolf Mayer; Andreas Ekelhart

Conference ProceedingsOPEN ACCESS

A Baseline for Attribute Disclosure Risk in Synthetic Data

CODASPY 2020 - Proceedings of the 10th ACM Conference on Data and Application Security and Privacy (2020) 133-143

DOI: 10.1145/3374664.3375722

24Citations

34Readers

Get full text

Abstract

The generation of synthetic data is widely considered as viable method for alleviating privacy concerns and for reducing identification and attribute disclosure risk in micro-data. The records in a synthetic dataset are artificially created and thus do not directly relate to individuals in the original data in terms of a 1-to-1 correspondence. As a result, inferences about said individuals appear to be infeasible and, simultaneously, the utility of the data may be kept at a high level. In this paper, we challenge this belief by interpreting the standard attacker model for attribute disclosure as classification problem. We show how disclosure risk measures presented in recent publications may be compared to or even be reformulated as machine learning classification models. Our overall goal is to empirically analyze attribute disclosure risk in synthetic data and to discuss its close relationship to data utility. Moreover, we improve the baseline for attribute disclosure risk from the attacker's perspective by applying variants of the RadiusNearestNeighbor and the EnsembleVote classifier.

Author supplied keywords

Cite

CITATION STYLE

APA

Hittmeir, M., Mayer, R., & Ekelhart, A. (2020). A Baseline for Attribute Disclosure Risk in Synthetic Data. In CODASPY 2020 - Proceedings of the 10th ACM Conference on Data and Application Security and Privacy (pp. 133–143). Association for Computing Machinery, Inc. https://doi.org/10.1145/3374664.3375722

A Baseline for Attribute Disclosure Risk in Synthetic Data

Abstract

Author supplied keywords

Cite

Register to see more suggestions