Comparing whole genomes using DNA...
Since the early 1990s a large amount of effort has focused on determining the complete genomic DNA sequence of many diverse organisms. Remarkably, virtually all this sequencing has been done using a single method: chain termination sequencing using dideoxynucle- osides1, usually referred to as Sanger sequencing. From the determination of the first complete genome sequence of an organism, the bacteriophage ��X174 (Ref.��2), to the completion of 95% of the human genome sequence3,4, many technical advances in methodologies, automation and computing rapidly increased the rate at which DNA sequence was obtained5. The availability of genomic sequences has led to the development of many genome- scale analytical techniques that have greatly enriched modern biology ��� for example, techniques to measure global mRNA abundance6,7, systematically knock out all genes8, perturb their function9 and generate comprehen- sive clone collections10,11 ��� and together they constitute the new field of genomics. Studies of sequence variation in the same or similar species have many potential applications, ranging from understanding complex human diseases to analysing the products of experimental��evolution. However, to realize fully the potential of this science, the task of character- izing genomes should be reduced to a routine procedure that can be done on hundreds of samples. To this end, the research and commercial community is accelerating towards new approaches to genome sequencing12���14 that are increasingly less expensive, more rapid and efficient, and more widely available. Nevertheless, determining even modest numbers of complete genomic sequences is still a substantial undertaking, entailing equipment, infrastructure and running expenses beyond the resources of most individual laboratories, and the study of sequence variation through direct genome sequencing remains the province of a minority of biologists. The applications that are envisioned for cheap and rapid sequencing technology do not actually require repeated determination of entire genomic sequences. Methods that efficiently detect genomic differences, be they structural rearrangements, polymorphisms or mutations, often suffice to reduce the sequencing requirement to a tiny fraction of the genome, a capabil- ity that is routine in most modern biology laboratories. Several technologies that use hybridization to DNA microarrays are effective for detecting genomic varia- tion in closely related samples. Thus, questions in which a researcher aims to compare normal and diseased tis- sues from the same individual or mutant and wild-type DNA from the same experimental organism can often be addressed by microarray-based experimental com- parison as opposed to exhaustive sequencing of entire genomes. This Review is focused on the global characteriza- tion of differences between closely related genomes ��� an approach that is ideally suited to microarrays. We describe the various forms of genomic variation that can be detected using microarray-based approaches and discuss some of the important experimental con- siderations, ranging from experimental design to data analysis and visualization. We highlight the versatility of these approaches, their applicability to various questions and organisms, and briefly describe how global views of genomic diversity are revealing new biological insights. *Lewis���Sigler Institute for Integrative Genomics and ���Department of Molecular Biology, Carl Icahn Laboratory, Princeton University, Princeton, New Jersey, 08544, USA. e���mails: dgresham@ genomics.princeton.edu maitreya@princeton.edu botstein@genomics. princeton.edu doi:10.1038/nrg2335 Experimental evolution The��long-term��selection��of�� microorganisms��or��populations�� under��laboratory��conditions��to�� model��simple��evolutionary�� scenarios. Detect The��identification��of��a��genomic�� variant,��the��actual��state��of�� which��is��not��known��until��further�� analysis. Comparing whole genomes using DNA microarrays David Gresham*���, Maitreya J. Dunham* and David Botstein*��� Abstract | The rapid accumulation of complete genomic sequences offers the opportunity to carry out an analysis of inter- and intra-individual genome variation within a species on a routine basis. Sequencing whole genomes requires resources that are currently beyond those of a single laboratory and therefore it is not a practical approach for resequencing hundreds of individual genomes. DNA microarrays present an alternative way to study differences between closely related genomes. Advances in microarray-based approaches have enabled the main forms of genomic variation (amplifications, deletions, insertions, rearrangements and base-pair changes) to be detected using techniques that are readily performed in individual laboratories using simple experimental approaches. REVIEWS NATuRe RevieWS | genetics voluMe 9 | ApRil 2008 | 291 �� 2008 Nature Publishing Group
DNA probe In��the��context��of��microarrays,�� DNA��probe��refers��to��the��DNA�� oligonucleotide,��PCR��product�� or��genomic��clone��that��is�� attached��to��a��microarray��in�� order��to��probe��a��labelled�� genomic��DNA��sample��that��is�� added��in��solution.��In��the�� context��of��Southern��blotting,�� DNA��probe��refers��to��the�� labelled��DNA��oligonucleotide�� that��is��added��in��solution��to�� probe��the��genomic��DNA�� sample��that��is��immobilized��on�� a��membrane. Photolithography The��use��of��masks��to���� selectively��deprotect��nascent�� oligonucleotides��using��light,�� allowing��the��parallel��synthesis�� of��millions��of��probes. Ink-jet deposition The��use��of��print��cartridge�� heads��to��deposit��one��of��the�� four��DNA��bases��at��a��probe��site�� on��the��microarray. Fluorescent in situ hybridization (fISH).��A��technique��in��which���� a��fluorescently��labelled��DNA�� probe��is��used��to��detect��a�� particular��chromosome��or�� gene��using��fluorescence�� microscopy. Quantitative PCR A��procedure��in��which��the�� products��of��a��PCR��reaction���� are��measured��by��monitoring�� the��signal��that��is��produced���� by��a��fluorescent��dye,��which�� accumulates��during��each���� PCR��cycle. Tm The��T m ��(melting��temperature)�� of��an��oligonucleotide��is��the�� temperature��at��which��50%��of�� the��duplex��strands��are�� separated. Hybridization technology DNA microarrays are a collection of DNA��probes that are arrayed on a solid support and are used to assay, through hybridization, the presence of complementary DNA that is present in a sample (see BOXeS��1,2). The experimen- tal conditions for annealing complementary strands of DNA was reported15 within a decade of the determina- tion of the structure of DNA, and it was quickly realized that in��vitro hybridization of DNA presented a means for comparing genomes. one initial exemplar was the visualization of hybridization products between two entire bacteriophage genomes using electron micros- copy16. The development of blotting techniques, which use labelled DNA probes for visualization, presaged the fabrication of synthetic nucleotides on a solid sup- port17,18. Moreover, the effect of single mismatches on hybridization efficiency was soon appreciated and was used to detect mutations in bacteriophage19 and human DNA20 well before the advent of DNA microarrays. DNA microarrays are made either by chemically syn- thesizing DNA probes on a solid surface or by attaching pre-made DNA probes to a solid surface. Maksos and Southern21 first demonstrated the synthesis of arrays of oligonucleotides on a solid support in��situ. From these initial experiments, advances in technology and chemis- try resulted in increasingly higher density oligonucleotide microarrays synthesized in��situ��using techniques such as photolithography22 and ink-jet��deposition23. Simultaneously, the development of printing techniques24 allowed the robotic arraying of pCR products, pre-synthesized oli- gonucleotides, or genomic clones such as cDNA or BAC clones ��� often referred to as spotted microarrays. For genomic analysis a tiling array design is desirable, in which DNA probes are chosen from contiguous stretches of the genome. Whereas only short-oligonucleotide microarrays are appropriate for detecting sequence changes, all types of microarray can be used to detect structural variation. An important distinction is between microarrays that provide truly comprehensive coverage of the genome (whole genome) and those that provide partial coverage across the genome (genome scale). until recently, whole-genome coverage using oligonucleotide arrays had only been available for small genomes such as those of viruses25 or the human mitochondrion26,27. At present, whole-genome coverage of larger genomes can only be achieved using large probes such as BACs28,29. However, advances in engineering and chem- istry, largely made by the commercial manufacturers of microarrays, have enabled the construction of increas- ingly dense oligonucleotide arrays with 105���106 probes per microarray. Thus, it is already possible to manufac- ture short-oligonucleotide microarrays that cover the entire (although relatively small) genomes of eukaryo- tic organisms such as Saccharomyces cerevisiae30,31 and larger genomes such as Arabidopsis thaliana32,33. using dozens of arrays, complete coverage of even mammalian genome sequences has been achieved quite recently34. The availability of high-density microarrays has facilitated the development of rapid and comprehensive approaches to characterizing genomes. These methods are being applied to a myriad of questions ��� from explaining the genetic basis of phenotypic variation to describing the extent and nature of genomic diversity. Detection of structural variation Structural variation in the genome refers to microscopic and submicroscopic alterations of the genome and includes deletions and duplications, copy number vari- ation (CNv), insertions, inversions and chromosomal translocations35. This broad class of variants constitute a diverse and pervasive source of variation with known functional consequences, including increased pathogenic- ity and antibiotic resistance of microorganisms36, a range of human developmental disorders37 and association with human cancers38. in contrast to targeted methods for detecting struc- tural variation, such as fluorescent��in��situ��hybridization�� �� (FiSH) and quantitative��PCR, microarray-based approaches allow structural variation to be assessed across the entire genome in an unbiased manner. The approach that is Box 1 | The chemical basis of genome comparison As with all intermolecular reactions, the rate of formation of the DNA duplex that is formed between the probe and the sample is a function of both the concentration of reactants and temperature. To use hybridization to compare genomes at the sequence level it is necessary to maximize the difference between the Tm of the perfectly matched DNA and the T m of the mismatched DNA. This difference is highly dependent on the length of the oligonucleotide and in practice is likely to only be within the range of detection for oligonucleotides that are shorter than 50 bp. Therefore, short probes are required to interrogate sequence differences between genomes. Longer probes ��� such as those provided by BAC clones, cDNA clones or PCR products ��� provide greater coverage of the genome and allow detection of structural variation, even in the presence of a small number of sequence differences (see table IV, insertional variation SV, structural variation). Probe type Probe size Use Benefits Limitation BACs 100 kb SV, IV Whole genome coverage Low resolution PCR products 1 kb SV, IV Higher resolution Low coverage cDNA clones 1���2 kb SV, IV Higher resolution Low coverage Spotted oligonucleotides 70mer SV, IV Sensitive to sequence variation Low coverage In��situ��synthesized oligonucleotides 20���60mer SV, IV, Sequence analysis REVIEWS 292 | ApRil 2008 | voluMe 9 www.nature.com/reviews/genetics �� 2008 Nature Publishing Group