Statistical Distributions of Sequencing by Synthesis with Probabilistic Nucleotide Incorporation

  • Kong Y
  • 11


    Mendeley users who have this article in their library.
  • 3


    Citations of this article.


Sequencing by synthesis is used in many next-generation DNA sequencing technologies. Some of the technologies, especially those exploring the principle of single-molecule sequencing, allow incomplete nucleotide incorporation in each cycle. We derive statistical distributions for sequencing by synthesis by taking into account the possibility that nucleotide incorporation may not be complete in each flow cycle. The statistical distributions are expressed in terms of nucleotide probabilities of the target sequences and the nucleotide incorporation probabilities for each nucleotide. We give exact distributions both for fixed number of flow cycles and for fixed sequence length. Explicit formulas are derived for the mean and variance of these distributions. The results are generalizations of our previous work for pyrosequencing. Incomplete nucleotide incorporation leads to significant change in the mean and variance of the distributions, but still they can be approximated by normal distributions with the same mean and variance. The results are also generalized to handle sequence context dependent incorporation. The statistical distributions will be useful for instrument and software development for sequencing by synthesis platforms.

Author-supplied keywords

  • combinatorics
  • next-generation DNA sequencing
  • probability
  • sequence analysis

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in

Find this document


  • Y Kong

Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free