Approximate word matches between two random sequences

Conrad J. Burden; Miriam R. Kantorovitz; Susan R. Wilson

Journal ArticleOPEN ACCESS

Approximate word matches between two random sequences

Annals of Applied Probability (2008) 18(1) 1-21

DOI: 10.1214/07-AAP452

18Citations

6Readers

Abstract

Given two sequences over a finite alphabet L, the D2 statistic is the number of m-letter word matches between the two sequences. This statistic is used in bioinformatics for expressed sequence tag database searches. Here we study a generalization of the D2 statistic in the context of DNA sequences, under the assumption of strand symmetric Bernoulli text. Fork < m, we look at the count of m-letter word matches with up to k mismatches. For this statistic, we compute the expectation, give upper and lower bounds for the variance and prove its distribution is asymptotically normal. © Institute of Mathematical Statistics, 2008.

Author supplied keywords

Cite

CITATION STYLE

APA

Burden, C. J., Kantorovitz, M. R., & Wilson, S. R. (2008). Approximate word matches between two random sequences. Annals of Applied Probability, 18(1), 1–21. https://doi.org/10.1214/07-AAP452

Approximate word matches between two random sequences

Abstract

Author supplied keywords

Cite

Register to see more suggestions