Abstract
Frequent event mining is a fundamental task to extract insight from an event sequence (long sequence of events that are associated with time points). However, it may expose sensitive events that leak confidential business knowledge or lead to intrusive inferences about groups of individuals. In this work, we aim to prevent this threat, by deleting occurrences of sensitive events, while preserving the utility of the event sequence. To quantify utility, we propose a model that captures changes, caused by deletion, to the probability distribution of events across the sequence. Based on the model, we define the problem of sanitizing an event sequence as an optimization problem. Solving the problem is important to preserve the output of many mining tasks, including frequent pattern mining and sequence segmentation. However, this is also challenging, due to the exponential number of ways to apply deletion to the sequence. To optimally solve the problem when there is one sensitive event, we develop an efficient algorithm based on dynamic programming. The algorithm also forms the basis of a simple, iterative method that optimally sanitizes an event sequence, when there are multiple sensitive events. Experiments on real and synthetic datasets show the effectiveness and efficiency of our method.
Cite
CITATION STYLE
Loukides, G., & Gwadera, R. (2015). Optimal event sequence sanitization. In SIAM International Conference on Data Mining 2015, SDM 2015 (pp. 775–783). Society for Industrial and Applied Mathematics Publications. https://doi.org/10.1137/1.9781611974010.87
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.