Most significant substring mining based on chi-square measure

8Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Given the vast reservoirs of sequence data stored worldwide, efficient mining of string databases such as intrusion detection systems, player statistics, texts, proteins, etc. has emerged as a great challenge. Searching for an unusual pattern within long strings of data has emerged as a requirement for diverse applications. Given a string, the problem then is to identify the substrings that differs the most from the expected or normal behavior, i.e., the substrings that are statistically significant (i.e., less likely to occur due to chance alone). To this end, we use the chi-square measure and propose two heuristics for retrieving the top-k substrings with the largest chi-square measure. We show that the algorithms outperform other competing algorithms in the runtime, while maintaining a high approximation ratio of more than 0.96. © 2010 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Dutta, S., & Bhattacharya, A. (2010). Most significant substring mining based on chi-square measure. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6118 LNAI, pp. 319–327). https://doi.org/10.1007/978-3-642-13657-3_35

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free