Data quality has become a pervasive challenge for organizations as they wrangle with large, heterogeneous datasets to extract value. Existing data cleaning solutions have focused on scalable techniques to resolve inconsistencies quickly. However, given the proliferation of sensitive, confidential user information, data privacy concerns have largely remained unexplored in data cleaning techniques. In this work, we present a new privacy-aware, data cleaning framework that aims to resolve data inconsistencies while minimizing the amount of information disclosed. We present a set of data disclosure operations that facilitate the data cleaning process, and propose two information-theoretic measures for privacy loss and data utility that are used to correct inconsistencies in the data.
CITATION STYLE
Huang, Y., & Chiang, F. (2015). Towards a unified framework for data cleaning and data privacy. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9419, pp. 359–365). Springer Verlag. https://doi.org/10.1007/978-3-319-26187-4_34
Mendeley helps you to discover research relevant for your work.