Approximate word matches between two random sequences

18Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.

Abstract

Given two sequences over a finite alphabet L, the D2 statistic is the number of m-letter word matches between the two sequences. This statistic is used in bioinformatics for expressed sequence tag database searches. Here we study a generalization of the D2 statistic in the context of DNA sequences, under the assumption of strand symmetric Bernoulli text. Fork < m, we look at the count of m-letter word matches with up to k mismatches. For this statistic, we compute the expectation, give upper and lower bounds for the variance and prove its distribution is asymptotically normal. © Institute of Mathematical Statistics, 2008.

Cite

CITATION STYLE

APA

Burden, C. J., Kantorovitz, M. R., & Wilson, S. R. (2008). Approximate word matches between two random sequences. Annals of Applied Probability, 18(1), 1–21. https://doi.org/10.1214/07-AAP452

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free