Finding surprisingly frequent patterns of variable lengths in sequence data

4Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.

Abstract

We address the problem of finding 'surprising' patterns of variable length in sequence data, where a surprising pattern is defined as a subsequence of a longer sequence, whose observed frequency is statistically significant with respect to a given distribution. Finding statistically significant patterns in sequence data is the core task in some interesting applications such as Biological motif discovery and anomaly detection. We show that the presence of few 'true' surprising patterns in the data could cause a large number of highly-correlated patterns to stand statistically significant just because of those few significant patterns. Our approach to solving the 'redundant patterns' problem is based on capturing the dependencies between patterns through an 'explain' relationship where a set of patterns can explain the statistical significance of another pattern. This allows us to address the problem of redundancy by choosing a few 'core' patterns which explain the significance of all other significant patterns. We propose a greedy algorithm for efficiently finding an approximate core pattern set of minimum size. Using both synthetic and real-world sequential data, chosen from different domains including Medicine and Bioinformatics, we show that the proposed notion of core patterns very closely matches the notion of 'true' surprising patterns in data.

Cite

CITATION STYLE

APA

Sadoddin, R., Sander, J., & Rafiei, D. (2016). Finding surprisingly frequent patterns of variable lengths in sequence data. In 16th SIAM International Conference on Data Mining 2016, SDM 2016 (pp. 27–35). Society for Industrial and Applied Mathematics Publications. https://doi.org/10.1137/1.9781611974348.4

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free