Exact distribution of a spaced seed statistic for DNA homology detection

6Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Let a seed, S, be a string from the alphabet {1,*}, of arbitrary length k, which starts and ends with a 1. For example, S∈= 11*1. S occurs in a binary string T at position h if the length k substring of T ending at position h contains a 1 in every position where there is a 1 in S. We say that the 1s at the corresponding positions in T are covered. We are interested in calculating the probability distribution for the number of 1s covered by a seed S in an iid Bernoulli string of length n with probability of 1 equal to p. We refer to this new probability distribution as C nSp , for covered, with S being the seed. We present an efficient method to calculate this distribution exactly. Covered 1s represent matching positions detected in DNA sequences when using multiple hits of a spaced seed. Knowledge of the distribution provides a statistical threshold for distinguishing true homologies from randomly matching sequences. © 2009 Springer Berlin Heidelberg.

Cite

CITATION STYLE

APA

Benson, G., & Mak, D. Y. F. (2008). Exact distribution of a spaced seed statistic for DNA homology detection. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5280 LNCS, pp. 282–293). Springer Verlag. https://doi.org/10.1007/978-3-540-89097-3_27

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free