Machine learning in artificial intelligence relies on legitimate big data, where the process of data publishing involves a large number of privacy issues. m-Invariance is a fundamental privacy-preserving notion in microdata republication. Unfortunately, if for big data release, the existing generalization based m-Invariance requiring to modify the origin microdata incurs the problems of data utility loss and poor aggregate querying performance. Furthermore, due to the high dimension of quasi-identifiers in big data, unaffordable generalization operations makes it difficult to be practical. In this paper, we remedy the drawbacks above to achieve m-Invariance in big data release. We first propose a new anatomy based m-Invariance definition and framework, where the anatomy approach tries to achieve privacy by breaking the correlations between the sensitive attributes and non-sensitive identifiers. We next establish a series of criteria for anatomy to cope with republication due to the data dynamics. We then develop an algorithm to realize the above ideas. Theoretical and experimental analysis confirm the advantages of our anatomy based m-Invariance approach in the terms of data utility, aggregate querying accuracy and capacity to process high dimension of quasi-identifiers in big data release.
CITATION STYLE
Wang, H., Ma, W., Zheng, H., Liang, Z., & Wu, Q. (2019). Privacy-Preserving Sequential Data Publishing. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11928 LNCS, pp. 596–614). Springer. https://doi.org/10.1007/978-3-030-36938-5_37
Mendeley helps you to discover research relevant for your work.