RNA-Seq: a revolutionary tool for transcriptomics.
Nature Reviews Genetics (2009)
- PubMed: 19015660
Available from www.pubmedcentral.nih.gov
or
Abstract
RNA-Seq is a recently developed approach to transcriptome profiling that uses deep-sequencing technologies. Studies using this method have already altered our view of the extent and complexity of eukaryotic transcriptomes. RNA-Seq also provides a far more precise measurement of levels of transcripts and their isoforms than other methods. This article describes the RNA-Seq approach, the challenges associated with its application, and the advances made so far in characterizing several eukaryote transcriptomes.
Author-supplied keywords
Available from www.pubmedcentral.nih.gov
Page 1
RNA-Seq: a revolutionary tool for...
The transcriptome is the complete set of transcripts in a cell, and their quantity, for a specific developmental stage or physi- ological condition. Understanding the transcriptome is essential for interpreting the functional elements of the genome and revealing the molecular constituents of cells and tissues, and also for understand- ing development and disease. The key aims of transcriptomics are: to catalogue all species of transcript, including mRNAs, non-coding RNAs and small RNAs to determine the transcriptional structure of genes, in terms of their start sites, 5��� and 3��� ends, splicing patterns and other post-transcriptional modifications and to quantify the changing expression levels of each transcript during development and under different conditions. Various technologies have been developed to deduce and quantify the transcriptome, including hybridization- or sequence-based approaches. Hybridization-based approaches typically involve incubating fluorescently labelled cDNA with custom-made microarrays or commercial high-density oligo microar- rays. Specialized microarrays have also been designed for example, arrays with probes spanning exon junctions can be used to detect and quantify distinct spliced isoforms1. Genomic tiling microar- rays that represent the genome at high density have been constructed and allow the mapping of transcribed regions to a very high resolution, from several base pairs to ~100 bp2���5. Hybridization-based approaches are high throughput and relatively inexpensive, except for high- resolution tiling arrays that interrogate large genomes. However, these methods have several limitations, which include: reliance upon existing knowledge about genome sequence high background levels owing to cross-hybridization6,7 and a limited dynamic range of detection owing to both background and saturation of signals. Moreover, comparing expression levels across different experiments is often difficult and can require complicated normalization methods. In contrast to microarray methods, sequence-based approaches directly deter- mine the cDNA sequence. Initially, Sanger sequencing of cDNA or EST libraries8,9 was used, but this approach is relatively low throughput, expensive and generally not quantitative. Tag-based methods were developed to overcome these limitations, including serial analysis of gene expression (SAGE)10,11, cap analysis of gene expression (CAGE)12���14 and massively parallel signature sequencing (MPSS)15���17. These tag-based sequencing approaches are high through- put and can provide precise, ���digital��� gene expression levels. However, most are based on expensive Sanger sequencing technology, and a significant portion of the short tags cannot be uniquely mapped to the reference genome. Moreover, only a portion of the transcript is analysed and isoforms are generally indistinguishable from each other. These disadvantages limit the use of traditional sequencing technology in annotating the structure of transcriptomes. Recently, the development of novel high-throughput DNA sequencing meth- ods has provided a new method for both mapping and quantifying transcriptomes. This method, termed RNA-Seq (RNA sequencing), has clear advantages over existing approaches and is expected to rev- olutionize the manner in which eukaryotic transcriptomes are analysed. It has already been applied to Saccharomyces cerevisiae, Schizosaccharomyces pombe, Arabidopsis thaliana, mouse and human cells18���24. Here, we explain how RNA-Seq works, discuss its challenges and provide an overview of studies that have used this approach, which have already begun to change our view of eukaryotic transcriptomes. RNA-Seq technology and benefits RNA-Seq uses recently developed deep- sequencing technologies. In general, a population of RNA (total or fractionated, such as poly(A)+) is converted to a library of cDNA fragments with adaptors attached to one or both ends (FIG. 1). Each molecule, with or without amplification, is then sequenced in a high-throughput manner to obtain short sequences from one end (single-end sequencing) or both ends (pair-end sequencing).The reads are typi- cally 30���400 bp, depending on the DNA- sequencing technology used. In principle, any high-throughput sequencing technol- ogy25 can be used for RNA-Seq, and the Illumina IG18���21,23,24, Applied Biosystems SOLiD22 and Roche 454 Life Science26���28 INNOVATION RNA-Seq: a revolutionary tool for transcriptomics Zhong Wang, Mark Gerstein and Michael Snyder Abstract | RNA-Seq is a recently developed approach to transcriptome profiling that uses deep-sequencing technologies. Studies using this method have already altered our view of the extent and complexity of eukaryotic transcriptomes. RNA-Seq also provides a far more precise measurement of levels of transcripts and their isoforms than other methods. This article describes the RNA-Seq approach, the challenges associated with its application, and the advances made so far in characterizing several eukaryote transcriptomes. RNA-Seq [���] is expected to revolutionize the manner in which eukaryotic transcriptomes are analysed. NATURE REVIEwS | genetics ADVANCE ONLINE PUBLICATION | 1 PeRSPecTiveS Nature Reviews Genetics | AOP, published online 18 November 2008 doi:10.1038/nrg2484
Page 2
systems have already been applied for this purpose. The Helicos Biosciences tSMS system has not yet been used for published RNA-Seq studies, but is also appropriate and has the added advantage of avoiding amplification of target cDNA. Following sequencing, the resulting reads are either aligned to a reference genome or reference transcripts, or assembled de novo without the genomic sequence to produce a genome-scale transcription map that consists of both the transcrip- tional structure and/or level of expression for each gene. Although RNA-Seq is still a technology under active development, it offers several key advantages over existing technologies (Table 1). First, unlike hybridization-based approaches, RNA-Seq is not limited to detecting transcripts that correspond to existing genomic sequence. For example, 454-based RNA-Seq has been used to sequence the transcriptome of the Glanville fritillary butterfly27. This makes RNA-Seq particularly attractive for non-model organisms with genomic sequences that are yet to be determined. RNA-Seq can reveal the precise location of transcription boundaries, to a single- base resolution. Furthermore, 30-bp short reads from RNA-Seq give information about how two exons are connected, whereas longer reads or pair-end short reads should reveal connectivity between multiple exons. These factors make RNA- Seq useful for studying complex tran- scriptomes. In addition, RNA-Seq can also reveal sequence variations (for example, SNPs) in the transcribed regions22,24. A second advantage of RNA-Seq relative to DNA microarrays is that RNA-Seq has very low, if any, background signal because DNA sequences can been unambiguously mapped to unique regions of the genome. RNA-Seq does not have an upper limit for quantifica- tion, which correlates with the number of sequences obtained. Consequently, it has a large dynamic range of expres- sion levels over which transcripts can be detected: a greater than 9,000-fold range was estimated in a study that analysed 16 million mapped reads in Saccharomyces cerevisiae18, and a range spanning five orders of magnitude was estimated for 40 million mouse sequence reads20. By contrast, DNA microarrays lack sensitivity for genes expressed either at low or very high levels and therefore have a much smaller dynamic range (one-hundredfold to a few-hundredfold) (FIG. 2). RNA-Seq has also been shown to be highly accurate for quantifying expression levels, as deter- mined using quantitative PCR (qPCR)18 and spike-in RNa controls of known concentra- tion20. The results of RNA-Seq also show high levels of reproducibility, for both technical and biological replicates18,22. Finally, because there are no cloning steps, and with the Helicos technology there is no amplification step, RNA-Seq requires less RNA sample. Taking all of these advantages into account, RNA-Seq is the first sequencing- based method that allows the entire transcriptome to be surveyed in a very high-throughput and quantitative man- ner. This method offers both single-base resolution for annotation and ���digital��� gene expression levels at the genome scale, often at a much lower cost than either tiling arrays or large-scale Sanger EST sequencing. Challenges for RNA-Seq Library construction. The ideal method for transcriptomics should be able to directly identify and quantify all RNAs, small or large. Although there are only a few steps in RNA-Seq (FIG. 1), it does involve several manipulation stages dur- ing the production of cDNA libraries, which can complicate its use in profiling all types of transcript. Unlike small RNAs (microRNas (miRNAs), Piwi-interacting RNas (piRNAs), short interfering RNas (siRNAs) and many others), which can be directly sequenced after adaptor ligation, larger RNA mol- ecules must be fragmented into smaller pieces (200���500 bp) to be compatible with most deep-sequencing technologies. Common fragmentation methods include Naturetext). RevieSequencing ws | Genetics Base-resolution expression profile Exonic reads or Nucleotide position RNA expr ession le ve l Coding sequence ORF Junction reads Mapped sequence reads Short sequence reads EST library with adaptors RNA fragments cDNA mRNA AAAAAAAA AAAAAAAA TTTTTTTT ATCACAGTGGGACTCCATAAATTTTTCT CGAAGGACCAGCAGAAACGAGAGAAAAA GGACAGAGTCCCCAGCGGGCTGAAGGGG ATGAAACATTAAAGTCAAACAATATGAA ...... ...AAAAAA ...AAAAAAAAA poly(A) end reads Figure 1 | A typical RnA-seq experiment. Briefly, long RNAs are first converted into a library of cDNA fragments through either RNA fragmentation or DNA fragmentation (see main adaptors (blue) are subsequently added to each cDNA fragment and a short sequence is obtained from each cDNA using high-throughput sequencing technology. The resulting sequence reads are aligned with the reference genome or transcriptome, and classified as three types: exonic reads, junction reads and poly(A) end-reads. These three types are used to generate a base-resolution expression profile for each gene, as illustrated at the bottom a yeast ORF with one intron is shown. Pers P ectives 2 | ADVANCE ONLINE PUBLICATION www.nature.com/reviews/genetics
Readership Statistics
4091 Readers on Mendeley
by Discipline
3% Medicine
by Academic Status
29% Ph.D. Student
15% Student (Master)
14% Student (Bachelor)
by Country
21% United States
8% United Kingdom
6% Brazil
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime




