Information Theory of DNA Sequencing

  • Motahari A
  • Bresler G
N/ACitations
Citations of this article
5Readers
Mendeley users who have this article in their library.

Abstract

DNA sequencing is the basic workhorse of modern day biology and medicine. Shot- gun sequencing is the dominant technique used: many randomly located short frag- ments called reads are extracted fromthe DNA sequence, and these reads are assembled to reconstruct the original sequence. A basic question is: given a sequencing technol- ogy and the statistics of the DNA sequence, what is the minimum number of reads required for reliable reconstruction? This number provides a fundamental limit to the performance of any assembly algorithm. By drawing an analogy between the DNA se- quencing problem and the classic communication problem, we formulate this question in terms of an information theoretic notion of sequencing capacity. This is the asymp- totic ratio of the length of the DNA sequence to the minimumnumber of reads required to reconstruct it reliably. We compute the sequencing capacity explicitly for a simple statistical model of the DNA sequence and the read process. Using this framework, we also study the impact of noise in the read process on the sequencing capacity.

Cite

CITATION STYLE

APA

Motahari, A., & Bresler, G. (2012). Information Theory of DNA Sequencing. Arxiv Preprint ArXiv:1203.6233, 1–33. Retrieved from http://arxiv.org/abs/1203.6233

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free