We introduce a new notion of motifs, called masks, that succinctly represents the repeated patterns for an input sequence T of n symbols drawn from an alphabet Σ. We show how to build the set of all frequent maximal masks of length L in O (2L n) time and space in the worst case, using the Karp-Miller-Rosenberg approach. We analytically show that our algorithm performs better than the method based on constant-time enumerating and checking all the potential (| Σ | + 1)L candidate patterns in T, after a polynomial-time preprocessing of T. Our algorithm is also cache-friendly, attaining O (2L s o r t (n)) block transfers, where s o r t (n) is the cache complexity of sorting n items. © 2009 Elsevier B.V. All rights reserved.
Battaglia, G., Cangelosi, D., Grossi, R., & Pisanti, N. (2009). Masking patterns in sequences: A new class of motif discovery with don’t cares. Theoretical Computer Science, 410(43), 4327–4340. https://doi.org/10.1016/j.tcs.2009.07.014