Privacy has received increasing concerns in publication of datasets that contain sensitive information. Preventing privacy disclosure and providing useful information to legitimate users for data mining are conflicting goals. Generalization and randomized response methods were proposed in database community to tackle this problem. However, both of them have postulated the same prior belief for all transactions, which might be wrong modeling and lead to privacy breach. Besides, generalization and randomized response methods usually require a privacy controlling parameter to control the tradeoff between privacy and data quality, which may put the data publishers in a dilemma. In this paper, a novel privacy preserving method for data publication is proposed based on conditional probability distribution and machine learning techniques, which can achieve different prior beliefs for different transactions. A basic cross sampling algorithm and a complete cross sampling algorithm are designed respectively for the settings of single sensitive attribute and multiple sensitive attributes, and an improved complete algorithm is developed by using Gibbs sampling, in order to enhance data utility when data are not sufficient. Our method can offer stronger privacy guarantee, while, as shown in the extensive experiments, retaining better data utility.
Liu, C., Chen, S., Zhou, S., Guan, J., & Ma, Y. (2019). A novel privacy preserving method for data publication. Information Sciences, 501, 421–435. https://doi.org/10.1016/j.ins.2019.06.022