Masking patterns in sequences: A new class of motif discovery with don't cares

6Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

We introduce a new notion of motifs, called masks, that succinctly represents the repeated patterns for an input sequence T of n symbols drawn from an alphabet Σ. We show how to build the set of all frequent maximal masks of length L in O (2L n) time and space in the worst case, using the Karp-Miller-Rosenberg approach. We analytically show that our algorithm performs better than the method based on constant-time enumerating and checking all the potential (| Σ | + 1)L candidate patterns in T, after a polynomial-time preprocessing of T. Our algorithm is also cache-friendly, attaining O (2L s o r t (n)) block transfers, where s o r t (n) is the cache complexity of sorting n items. © 2009 Elsevier B.V. All rights reserved.

Cite

CITATION STYLE

APA

Battaglia, G., Cangelosi, D., Grossi, R., & Pisanti, N. (2009). Masking patterns in sequences: A new class of motif discovery with don’t cares. Theoretical Computer Science, 410(43), 4327–4340. https://doi.org/10.1016/j.tcs.2009.07.014

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free