Exact distribution of a spaced seed statistic for DNA homology detection

Gary Benson; Denise Y.F. Mak

Conference Proceedings

Exact distribution of a spaced seed statistic for DNA homology detection

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2008) 5280 LNCS 282-293

DOI: 10.1007/978-3-540-89097-3_27

6Citations

7Readers

Get full text

Abstract

Let a seed, S, be a string from the alphabet {1,*}, of arbitrary length k, which starts and ends with a 1. For example, S∈= 11*1. S occurs in a binary string T at position h if the length k substring of T ending at position h contains a 1 in every position where there is a 1 in S. We say that the 1s at the corresponding positions in T are covered. We are interested in calculating the probability distribution for the number of 1s covered by a seed S in an iid Bernoulli string of length n with probability of 1 equal to p. We refer to this new probability distribution as C nSp , for covered, with S being the seed. We present an efficient method to calculate this distribution exactly. Covered 1s represent matching positions detected in DNA sequences when using multiple hits of a spaced seed. Knowledge of the distribution provides a statistical threshold for distinguishing true homologies from randomly matching sequences. © 2009 Springer Berlin Heidelberg.

Cite

CITATION STYLE

APA

Benson, G., & Mak, D. Y. F. (2008). Exact distribution of a spaced seed statistic for DNA homology detection. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5280 LNCS, pp. 282–293). Springer Verlag. https://doi.org/10.1007/978-3-540-89097-3_27

Exact distribution of a spaced seed statistic for DNA homology detection

Abstract

Cite

Register to see more suggestions