Hidden pattern statistics

12Citations
Citations of this article
20Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We consider the sequence comparison problem, also known as hidden pattern problem, where one searches for a given subsequence in a text (rather than a string understood as a sequence of consecutive symbols). A characteristic parameter is the number of occurrences of a given pattern w of length m as a subsequence in a random text of length n generated by a memoryless source. Spacings between letters of the pattern may either be constrained or not in order to define valid occurrences. We determine the mean and the variance of the number of occurrences, and establish a Gaussian limit law. These results are obtained via combinatorics on words, formal language techniques, and methods of analytic combinatorics based on generating functions and convergence of moments. The motivation to study this problem comes from an attempt at finding a reliable threshold for intrusion detections, from textual data processing applications, and from molecular biology. © 2011 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Flajolet, P., Guivarch, Y., Szpankowski, W., & Vallée, B. (2001). Hidden pattern statistics. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2076 LNCS, pp. 152–165). Springer Verlag. https://doi.org/10.1007/3-540-48224-5_13

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free