A fast and simple method for mining subsequences with surprising event counts

Jefrey Lijffijt

Conference ProceedingsOPEN ACCESS

A fast and simple method for mining subsequences with surprising event counts

Lijffijt J

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2013) 8188 LNAI(PART 1) 385-400

DOI: 10.1007/978-3-642-40988-2_25

2Citations

7Readers

Abstract

We consider the problem of mining subsequences with surprising event counts. When mining patterns, we often test a very large number of potentially present patterns, leading to a high likelihood of finding spurious results. Typically, this problem grows as the size of the data increases. Existing methods for statistical testing are not usable for mining patterns in big data, because they are either computationally too demanding, or fail to take into account the dependency structure between patterns, leading to true findings going unnoticed. We propose a new method to compute the significance of event frequencies in subsequences of a long data sequence. The method is based on analyzing the joint distribution of the patterns, omitting the need for randomization. We argue that computing the p-values exactly is computationally costly, but that an upper bound is easy to compute. We investigate the tightness of the upper bound and compare the power of the test with the alternative of post-hoc correction. We demonstrate the utility of the method on two types of data: text and DNA. We show that the proposed method is easy to implement and can be computed quickly. Moreover, we conclude that the upper bound is sufficiently tight and that meaningful results can be obtained in practice. © 2013 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

Lijffijt, J. (2013). A fast and simple method for mining subsequences with surprising event counts. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8188 LNAI, pp. 385–400). https://doi.org/10.1007/978-3-642-40988-2_25

A fast and simple method for mining subsequences with surprising event counts

Abstract

Author supplied keywords

Cite

Register to see more suggestions