Compound poisson approximation of word counts in DNA sequences

57Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.

Abstract

Identifying words with unexpected frequencies is an important problem in the analysis of long DNA sequences. To solve it, we need an approximation of the distribution of the number of occurrences N(W) of a word W. Modeling DNA sequences with m-order Markov chains, we use the Chen-Stein method to obtain Poisson approximations for two different counts. We approximate the “declumped” count of W by a Poisson variable and the number of occurrences N(W) by a compound Poisson variable. Combinatorial results are used to solve the general case of overlapping words and to calculate the parameters of these distributions. © 1995 EDP Sciences.

Cite

CITATION STYLE

APA

Schbath, S. (1997). Compound poisson approximation of word counts in DNA sequences. ESAIM - Probability and Statistics, 1, 1–16. https://doi.org/10.1051/ps:1997100

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free