Towards Providing Automated Feedback on the Quality of Inferences from Synthetic Datasets

  • McClure D
  • Reiter J
N/ACitations
Citations of this article
6Readers
Mendeley users who have this article in their library.

Abstract

When releasing individual-level data to the public, statistical agencies typically alter data values to protect the confidentiality of individuals’ identities and sensitive attributes. When data undergo substantial perturbation, secondary data analysts’ inferences can be distorted in ways that they typically cannot determine from the released data alone. This is problematic, in that analysts have no idea if they should trust the results based on the altered data.To ameliorate this problem, agencies can establish verification servers, which are remote computers that analysts query for measures of the quality of inferences obtained from disclosure-protected data. The reported quality measures reflect the similarity between the analysis done with the altered data and the analysis done with the confidential data. However, quality measures can leak information about the confidential values, so that they too must be subject to disclosure protections. In this article, we discuss several approaches to releasing quality measures for verification servers when the public use data are generated via multiple imputation, also known as synthetic data. The methods can be modified for other stochastic perturbation methods.

Cite

CITATION STYLE

APA

McClure, D. R., & Reiter, J. P. (2012). Towards Providing Automated Feedback on the Quality of Inferences from Synthetic Datasets. Journal of Privacy and Confidentiality, 4(1). https://doi.org/10.29012/jpc.v4i1.616

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free