Sign up & Download
Sign in

Next-generation DNA sequencing.

by Jay Shendure, Hanlee Ji
Nature Biotechnology (2008)

Abstract

DNA sequence represents a single format onto which a broad range of biological phenomena can be projected for high-throughput data collection. Over the past three years, massively parallel DNA sequencing platforms have become widely available, reducing the cost of DNA sequencing by over two orders of magnitude, and democratizing the field by putting the sequencing capacity of a major genome center in the hands of individual investigators. These new technologies are rapidly evolving, and near-term challenges include the development of robust protocols for generating sequencing libraries, building effective new approaches to data-analysis, and often a rethinking of experimental design. Next-generation DNA sequencing has the potential to dramatically accelerate biological and biomedical research, by enabling the comprehensive analysis of genomes, transcriptomes and interactomes to become inexpensive, routine and widespread, rather than requiring significant production-scale efforts.

Cite this document (BETA)

Available from www.ncbi.nlm.nih.gov
Page 1
hidden

Next-generation DNA sequencing.

NATURE BIOTECHNOLOGY VOLUME 26 NUMBER 10 OCTOBER 2008 1135
Sanger sequencing
Since the early 1990s, DNA sequence production has almost exclusively
been carried out with capillary-based, semi-automated implementa-
tions of the Sanger biochemistry3,5,6 (Fig. 1a). In high-throughput
production pipelines, DNA to be sequenced is prepared by one of two
approaches: first, for shotgun de novo sequencing, randomly frag-
mented DNA is cloned into a high-copy-number plasmid, which is
then used to transform Escherichia coli; or second, for targeted rese-
quencing, PCR amplification is carried out with primers that flank
the target. The output of both approaches is an amplified template,
either as many ‘clonal’ copies of a single plasmid insert present within
a spatially isolated bacterial colony that can be picked, or as many PCR
amplicons present within a single reaction volume. The sequencing
biochemistry takes place in a ‘cycle sequencing’ reaction, in which cycles
of template denaturation, primer annealing and primer extension are
performed. The primer is complementary to known sequence immedi-
ately flanking the region of interest. Each round of primer extension is
stochastically terminated by the incorporation of fluorescently labeled
dideoxynucleotides (ddNTPs). In the resulting mixture of end-labeled
extension products, the label on the terminating ddNTP of any given
fragment corresponds to the nucleotide identity of its terminal position.
Sequence is determined by high-resolution electrophoretic separation
of the single-stranded, end-labeled extension products in a capillary-
based polymer gel. Laser excitation of fluorescent labels as fragments
of discreet lengths exit the capillary, coupled to four-color detection of
emission spectra, provides the readout that is represented in a Sanger
sequencing ‘trace’. Software translates these traces into DNA sequence,
while also generating error probabilities for each base-call7,8. The
approach that is taken for subsequent analysis—for example, genome
assembly or variant identification—depends on precisely what is being
sequenced and why. Simultaneous electrophoresis in 96 or 384 indepen-
dent capillaries provides a limited level of parallelization.
After three decades of gradual improvement, the Sanger biochem-
istry can be applied to achieve read-lengths of up to ~1,000 bp, and
per-base ‘raw’ accuracies as high as 99.999%. In the context of high-
throughput shotgun genomic sequencing, Sanger sequencing costs
on the order of $0.50 per kilobase.
The field of DNA sequencing technology development has a rich
and diverse history1,2. However, the overwhelming majority of DNA
sequence production to date has relied on some version of the Sanger
biochemistry3. Over the past five years, the incentive for develop-
ing entirely new strategies for DNA sequencing has emerged on at
least four levels, undeniably reinvigorating this field (for a review,
see ref. 4). First, in the wake of the Human Genome Project, there
are few remaining avenues of optimization through which signifi-
cant reductions in the cost of conventional DNA sequencing can be
achieved. Second, the potential utility of short-read sequencing has
been tremendously strengthened by the availability of whole genome
assemblies for Homo sapiens and all major model organisms, as these
effectively provide a reference against which short reads can be
mapped. Third, a growing variety of molecular methods have been
developed, whereby a broad range of biological phenomena can be
assessed by high-throughput DNA sequencing (e.g., genetic varia-
tion, RNA expression, protein-DNA interactions and chromosome
conformation). And fourth, general progress in technology across
disparate fields, including microscopy, surface chemistry, nucleotide
biochemistry, polymerase engineering, computation, data storage and
others, have made alternative strategies for DNA sequencing increas-
ingly practical to realize.
Here, we review the current crop of next-generation DNA sequenc-
ing platforms: how they work, their relative strengths and limitations,
and current and emerging applications. We briefly discuss related
developments in this field, such as new software tools and front-end
methods for isolating arbitrary genomic subsets. We emphasize that
the DNA sequencing technology field has become a quickly mov-
ing target, and we can at best provide a snapshot of this particular
moment.
Next-generation DNA sequencing
Jay Shendure1 & Hanlee Ji2
DNA sequence represents a single format onto which a broad range of biological phenomena can be projected for high-
throughput data collection. Over the past three years, massively parallel DNA sequencing platforms have become widely
available, reducing the cost of DNA sequencing by over two orders of magnitude, and democratizing the field by putting the
sequencing capacity of a major genome center in the hands of individual investigators. These new technologies are rapidly
evolving, and near-term challenges include the development of robust protocols for generating sequencing libraries, building
effective new approaches to data-analysis, and often a rethinking of experimental design. Next-generation DNA sequencing
has the potential to dramatically accelerate biological and biomedical research, by enabling the comprehensive analysis of
genomes, transcriptomes and interactomes to become inexpensive, routine and widespread, rather than requiring significant
production-scale efforts.
1Department of Genome Sciences, University of Washington, Foege Building
S-250, Box 355065, 1705 NE Pacific St., Seattle, Washington 98195-5065,
USA. 2Stanford Genome Technology Center and Division of Oncology, Dept. of
Medicine, Stanford University School of Medicine, CCSR 3215, 269 Campus
Drive, Stanford, California 94305, USA. Correspondence should be addressed to
J.S. (shendure@u.washington.edu) or H.J. (genomics_ji@stanford.edu).
Published online 9 October 2008; doi:10.1038/nbt1486
R E V I E W
©
20
08

Na
tu
re

Pu
bl
is
hi
n
g
Gr
o
u
p

ht
tp
://
ww
w.
n
at
ur
e.
co
m
/n
at
ur
eb
io
te
ch
no
lo
gy
Page 2
hidden
1136 VOLUME 26 NUMBER 10 OCTOBER 2008 NATURE BIOTECHNOLOGY
Molecule Sequencer technology (Helicos; Cambridge, MA, USA). The
concept of cyclic-array sequencing can be summarized as the sequencing
of a dense array of DNA features by iterative cycles of enzymatic manipu-
lation and imaging-based data collection15 (Shendure and colleagues16).
Two reports in 2005 described the first integrated implementations of
cyclic-array strategies that were both practical and cost-competitive with
conventional sequencing (J.S. et al.13 and ref. 14), and other groups have
quickly followed17,18.
Although these platforms are quite diverse in sequencing biochem-
istry as well as in how the array is generated, their work flows are
conceptually similar (Fig. 1b). Library preparation is accomplished
by random fragmentation of DNA, followed by in vitro ligation of
common adaptor sequences. Alternative
protocols can be used to generate jumping
libraries of mate-paired tags with control-
lable distance distributions13,19. The genera-
tion of clonally clustered amplicons to serve
as sequencing features can be achieved by
several approaches, including in situ polo-
nies15, emulsion PCR20 or bridge PCR21,22
(Fig. 2). What is common to these methods
is that PCR amplicons derived from any given
single library molecule end up spatially clus-
tered, either to a single location on a planar
substrate (in situ polonies, bridge PCR), or
to the surface of micron-scale beads, which
can be recovered and arrayed (emulsion
PCR). The sequencing process itself consists
of alternating cycles of enzyme-driven bio-
chemistry and imaging-based data acquisi-
tion (Fig. 3). The platforms that are discussed
here all rely on sequencing by synthesis, that
is, serial extension of primed templates, but
the enzyme driving the synthesis can be
either a polymerase16,23 or a ligase13,24. Data
are acquired by imaging of the full array
at each cycle (e.g., of fluorescently labeled
nucleotides incorporated by a polymerase).
Global advantages of second-generation
or cyclic-array strategies, relative to Sanger
sequencing, include the following: (i) in vitro
construction of a sequencing library, followed
by in vitro clonal amplification to generate
sequencing features, circumvents several bot-
tlenecks that restrict the parallelism of con-
ventional sequencing (that is, transformation
of E. coli and colony picking). (ii) Array-based
sequencing enables a much higher degree of
parallelism than conventional capillary-based
sequencing. As the effective size of sequencing
features can be on the order of 1 Mm, hundreds
of millions of sequencing reads can potentially
be obtained in parallel by rastered imaging of
a reasonably sized surface area. (iii) Because
array features are immobilized to a planar sur-
face, they can be enzymatically manipulated by
a single reagent volume. Although microliter-
scale reagent volumes are used in practice,
these are essentially amortized over the full set
of sequencing features on the array, dropping
the effective reagent volume per feature to the
Second-generation DNA sequencing
Alternative strategies for DNA sequencing can be grouped into several
categories (as discussed previously in ref. 4). These include (i) microelec-
trophoretic methods9 (Box 1), (ii) sequencing by hybridization10 (Box
2), (iii) real-time observation of single molecules11,12 (Box 3) and (iv)
cyclic-array sequencing (J.S. et al.13 and ref. 14). Here, we use ‘second-
generation’ in reference to the various implementations of cyclic-array
sequencing that have recently been realized in a commercial product (e.g.,
454 sequencing (used in the 454 Genome Sequencers, Roche Applied
Science; Basel), Solexa technology (used in the Illumina (San Diego)
Genome Analyzer), the SOLiD platform (Applied Biosystems; Foster
City, CA, USA), the Polonator (Dover/Harvard) and the HeliScope Single
3'-… GACTAGATACGAGCGTGA…-5' (template)
5'-... CTGAT (primer)
…CTGATC
…CTGATCT
…CTGATCTA
…CTGATCTAT
…CTGATCTATG
…CTGATCTATGC
…CTGATCTATGCT
…CTGATCTATGCTC
…CTGATCTATGCTCG
A G
G
A
A C
T
T
C T
T
G A
G
C
G C
A
A
TT
T C
C
G
C T
G
A
T
Cyclic array sequencing
(>106 reads/array)
Cycle 1 Cycle 2 Cycle 3
What is base 1? What is base 2? What is base 3?
a DNA fragmentation
In vivo cloning and amplification In vitro adaptor ligation
Electrophorsesis
(1 read/capillary)
Polymerase
dNTPs
Labeled ddNTPs
G
C
T
C
G
T
A
T
C
b DNA fragmentation
Cycle sequencing Generation of polony array
Figure 1 Work flow of conventional versus second-generation sequencing. (a) With high-throughput
shotgun Sanger sequencing, genomic DNA is fragmented, then cloned to a plasmid vector and
used to transform E. coli. For each sequencing reaction, a single bacterial colony is picked and
plasmid DNA isolated. Each cycle sequencing reaction takes place within a microliter-scale volume,
generating a ladder of ddNTP-terminated, dye-labeled products, which are subjected to high-resolution
electrophoretic separation within one of 96 or 384 capillaries in one run of a sequencing instrument. As
fluorescently labeled fragments of discrete sizes pass a detector, the four-channel emission spectrum
is used to generate a sequencing trace. (b) In shotgun sequencing with cyclic-array methods, common
adaptors are ligated to fragmented genomic DNA, which is then subjected to one of several protocols
that results in an array of millions of spatially immobilized PCR colonies or ‘polonies’15. Each polony
consists of many copies of a single shotgun library fragment. As all polonies are tethered to a planar
array, a single microliter-scale reagent volume (e.g., for primer hybridization and then for enzymatic
extension reactions) can be applied to manipulate all array features in parallel. Similarly, imaging-based
detection of fluorescent labels incorporated with each extension can be used to acquire sequencing
data on all features in parallel. Successive iterations of enzymatic interrogation and imaging are used to
build up a contiguous sequencing read for each array feature.
RE V IEW
©
20
08

Na
tu
re

Pu
bl
is
hi
n
g
Gr
o
u
p

ht
tp
://
ww
w.
n
at
ur
e.
co
m
/n
at
ur
eb
io
te
ch
no
lo
gy

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

1003 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
34% Ph.D. Student
 
13% Post Doc
 
11% Student (Master)
by Country
 
26% United States
 
9% Germany
 
9% United Kingdom