Sign up & Download
Sign in

Next-generation DNA sequencing.

by Jay Shendure, Hanlee Ji
Nature Biotechnology ()

Abstract

DNA sequence represents a single format onto which a broad range of biological phenomena can be projected for high-throughput data collection. Over the past three years, massively parallel DNA sequencing platforms have become widely available, reducing the cost of DNA sequencing by over two orders of magnitude, and democratizing the field by putting the sequencing capacity of a major genome center in the hands of individual investigators. These new technologies are rapidly evolving, and near-term challenges include the development of robust protocols for generating sequencing libraries, building effective new approaches to data-analysis, and often a rethinking of experimental design. Next-generation DNA sequencing has the potential to dramatically accelerate biological and biomedical research, by enabling the comprehensive analysis of genomes, transcriptomes and interactomes to become inexpensive, routine and widespread, rather than requiring significant production-scale efforts.

Cite this document (BETA)

Available from www.ncbi.nlm.nih.gov
Page 1
hidden

Next-generation DNA sequencing. -

nature biotechnology volume 26 number 10 oCTober 2008 1135 Sanger sequencing Since the early 1990s, DNA sequence production has almost exclusively been carried out with capillary-based, semi-automated implementa- tions of the Sanger biochemistry3,5,6 (Fig. 1a). In high-throughput production pipelines, DNA to be sequenced is prepared by one of two approaches: first, for shotgun de novo sequencing, randomly frag- mented DNA is cloned into a high-copy-number plasmid, which is then used to transform Escherichia coli or second, for targeted rese- quencing, PCR amplification is carried out with primers that flank the target. The output of both approaches is an amplified template, either as many ���clonal��� copies of a single plasmid insert present within a spatially isolated bacterial colony that can be picked, or as many PCR amplicons present within a single reaction volume. The sequencing biochemistry takes place in a ���cycle sequencing��� reaction, in which cycles of template denaturation, primer annealing and primer extension are performed. The primer is complementary to known sequence immedi- ately flanking the region of interest. Each round of primer extension is stochastically terminated by the incorporation of fluorescently labeled dideoxynucleotides (ddNTPs). In the resulting mixture of end-labeled extension products, the label on the terminating ddNTP of any given fragment corresponds to the nucleotide identity of its terminal position. Sequence is determined by high-resolution electrophoretic separation of the single-stranded, end-labeled extension products in a capillary- based polymer gel. Laser excitation of fluorescent labels as fragments of discreet lengths exit the capillary, coupled to four-color detection of emission spectra, provides the readout that is represented in a Sanger sequencing ���trace���. Software translates these traces into DNA sequence, while also generating error probabilities for each base-call7,8. The approach that is taken for subsequent analysis���for example, genome assembly or variant identification���depends on precisely what is being sequenced and why. Simultaneous electrophoresis in 96 or 384 indepen- dent capillaries provides a limited level of parallelization. After three decades of gradual improvement, the Sanger biochem- istry can be applied to achieve read-lengths of up to ~1,000 bp, and per-base ���raw��� accuracies as high as 99.999%. In the context of high- throughput shotgun genomic sequencing, Sanger sequencing costs on the order of $0.50 per kilobase. The field of DNA sequencing technology development has a rich and diverse history1,2. However, the overwhelming majority of DNA sequence production to date has relied on some version of the Sanger biochemistry3. Over the past five years, the incentive for develop- ing entirely new strategies for DNA sequencing has emerged on at least four levels, undeniably reinvigorating this field (for a review, see ref. 4). First, in the wake of the Human Genome Project, there are few remaining avenues of optimization through which signifi- cant reductions in the cost of conventional DNA sequencing can be achieved. Second, the potential utility of short-read sequencing has been tremendously strengthened by the availability of whole genome assemblies for Homo sapiens and all major model organisms, as these effectively provide a reference against which short reads can be mapped. Third, a growing variety of molecular methods have been developed, whereby a broad range of biological phenomena can be assessed by high-throughput DNA sequencing (e.g., genetic varia- tion, RNA expression, protein-DNA interactions and chromosome conformation). And fourth, general progress in technology across disparate fields, including microscopy, surface chemistry, nucleotide biochemistry, polymerase engineering, computation, data storage and others, have made alternative strategies for DNA sequencing increas- ingly practical to realize. Here, we review the current crop of next-generation DNA sequenc- ing platforms: how they work, their relative strengths and limitations, and current and emerging applications. We briefly discuss related developments in this field, such as new software tools and front-end methods for isolating arbitrary genomic subsets. We emphasize that the DNA sequencing technology field has become a quickly mov- ing target, and we can at best provide a snapshot of this particular moment. Next-generation DNA sequencing Jay Shendure1 & Hanlee Ji2 DNA sequence represents a single format onto which a broad range of biological phenomena can be projected for high- throughput data collection. Over the past three years, massively parallel DNA sequencing platforms have become widely available, reducing the cost of DNA sequencing by over two orders of magnitude, and democratizing the field by putting the sequencing capacity of a major genome center in the hands of individual investigators. These new technologies are rapidly evolving, and near-term challenges include the development of robust protocols for generating sequencing libraries, building effective new approaches to data-analysis, and often a rethinking of experimental design. Next-generation DNA sequencing has the potential to dramatically accelerate biological and biomedical research, by enabling the comprehensive analysis of genomes, transcriptomes and interactomes to become inexpensive, routine and widespread, rather than requiring significant production-scale efforts. 1Department of Genome Sciences, University of Washington, Foege Building S-250, Box 355065, 1705 NE Pacific St., Seattle, Washington 98195-5065, USA. 2Stanford Genome Technology Center and Division of Oncology, Dept. of Medicine, Stanford University School of Medicine, CCSR 3215, 269 Campus Drive, Stanford, California 94305, USA. Correspondence should be addressed to J.S. (shendure@u.washington.edu) or H.J. (genomics_ji@stanford.edu). Published online 9 October 2008 doi:10.1038/nbt1486 r e v i e w �� 200 8 Nature Publishing Group http://www.nature.com/naturebiotechnology
Page 2
hidden
1136 volume 26 number 10 oCTober 2008 nature biotechnology Molecule Sequencer technology (Helicos Cambridge, MA, USA). The concept of cyclic-array sequencing can be summarized as the sequencing of a dense array of DNA features by iterative cycles of enzymatic manipu- lation and imaging-based data collection15 (Shendure and colleagues16). Two reports in 2005 described the first integrated implementations of cyclic-array strategies that were both practical and cost-competitive with conventional sequencing (J.S. et al.13 and ref. 14), and other groups have quickly followed17,18. Although these platforms are quite diverse in sequencing biochem- istry as well as in how the array is generated, their work flows are conceptually similar (Fig. 1b). Library preparation is accomplished by random fragmentation of DNA, followed by in vitro ligation of common adaptor sequences. Alternative protocols can be used to generate jumping libraries of mate-paired tags with control- lable distance distributions13,19. The genera- tion of clonally clustered amplicons to serve as sequencing features can be achieved by several approaches, including in situ polo- nies15, emulsion PCR20 or bridge PCR21,22 (Fig. 2). What is common to these methods is that PCR amplicons derived from any given single library molecule end up spatially clus- tered, either to a single location on a planar substrate (in situ polonies, bridge PCR), or to the surface of micron-scale beads, which can be recovered and arrayed (emulsion PCR). The sequencing process itself consists of alternating cycles of enzyme-driven bio- chemistry and imaging-based data acquisi- tion (Fig. 3). The platforms that are discussed here all rely on sequencing by synthesis, that is, serial extension of primed templates, but the enzyme driving the synthesis can be either a polymerase16,23 or a ligase13,24. Data are acquired by imaging of the full array at each cycle (e.g., of fluorescently labeled nucleotides incorporated by a polymerase). Global advantages of second-generation or cyclic-array strategies, relative to Sanger sequencing, include the following: (i) in vitro construction of a sequencing library, followed by in vitro clonal amplification to generate sequencing features, circumvents several bot- tlenecks that restrict the parallelism of con- ventional sequencing (that is, transformation of E. coli and colony picking). (ii) Array-based sequencing enables a much higher degree of parallelism than conventional capillary-based sequencing. As the effective size of sequencing features can be on the order of 1 ��m, hundreds of millions of sequencing reads can potentially be obtained in parallel by rastered imaging of a reasonably sized surface area. (iii) Because array features are immobilized to a planar sur- face, they can be enzymatically manipulated by a single reagent volume. Although microliter- scale reagent volumes are used in practice, these are essentially amortized over the full set of sequencing features on the array, dropping the effective reagent volume per feature to the Second-generation DNA sequencing Alternative strategies for DNA sequencing can be grouped into several categories (as discussed previously in ref. 4). These include (i) microelec- trophoretic methods9 (Box 1), (ii) sequencing by hybridization10 (Box 2), (iii) real-time observation of single molecules11,12 (Box 3) and (iv) cyclic-array sequencing (J.S. et al.13 and ref. 14). Here, we use ���second- generation��� in reference to the various implementations of cyclic-array sequencing that have recently been realized in a commercial product (e.g., 454 sequencing (used in the 454 Genome Sequencers, Roche Applied Science Basel), Solexa technology (used in the Illumina (San Diego) Genome Analyzer), the SOLiD platform (Applied Biosystems Foster City, CA, USA), the Polonator (Dover/Harvard) and the HeliScope Single 3'-��� GACTAGATACGAGCGTGA���-5' (template) 5'-... CTGAT (primer) ���CTGATC ���CTGATCT ���CTGATCTA ���CTGATCTAT ���CTGATCTATG ���CTGATCTATGC ���CTGATCTATGCT ���CTGATCTATGCTC ���CTGATCTATGCTCG A G G A A C T T C A G G A A C T T C G A G C G C A A T G A G C G C A A T T C C G C T G A T T C C G C T G A T Cyclic array sequencing (106 reads/array) Cycle 1 Cycle 2 Cycle 3 What is base 1? What is base 2? What is base 3? a DNA fragmentation In vivo cloning and amplification In vitro adaptor ligation Electrophorsesis (1 read/capillary) Polymerase dNTPs Labeled ddNTPs G C T C G T A T C b DNA fragmentation Cycle sequencing Generation of polony array Figure 1 Work flow of conventional versus second-generation sequencing. (a) With high-throughput shotgun Sanger sequencing, genomic DNA is fragmented, then cloned to a plasmid vector and used to transform E. coli. For each sequencing reaction, a single bacterial colony is picked and plasmid DNA isolated. Each cycle sequencing reaction takes place within a microliter-scale volume, generating a ladder of ddNTP-terminated, dye-labeled products, which are subjected to high-resolution electrophoretic separation within one of 96 or 384 capillaries in one run of a sequencing instrument. As fluorescently labeled fragments of discrete sizes pass a detector, the four-channel emission spectrum is used to generate a sequencing trace. (b) In shotgun sequencing with cyclic-array methods, common adaptors are ligated to fragmented genomic DNA, which is then subjected to one of several protocols that results in an array of millions of spatially immobilized PCR colonies or ���polonies���15. Each polony consists of many copies of a single shotgun library fragment. As all polonies are tethered to a planar array, a single microliter-scale reagent volume (e.g., for primer hybridization and then for enzymatic extension reactions) can be applied to manipulate all array features in parallel. Similarly, imaging-based detection of fluorescent labels incorporated with each extension can be used to acquire sequencing data on all features in parallel. Successive iterations of enzymatic interrogation and imaging are used to build up a contiguous sequencing read for each array feature. RE v IEW �� 200 8 Nature Publishing Group http://www.nature.com/naturebiotechnology

Readership Statistics

4201 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
25% Ph.D. Student
 
17% Student (Master)
 
16% Student (Bachelor)
by Country
 
19% United States
 
8% United Kingdom
 
6% Brazil

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in