Human genetic variation and its c...
Elucidating the inherited basis of genetic variation in human health and disease is one of the major scientific challenges of the twenty-first century. In 2001 two ref- erence versions of the human genome were published. One was released by the Human Genome Sequencing Consortium and reflected the assembly of sequences derived from numerous donors1, whereas the other, released by Celera Genomics, was a consensus sequence derived from five individuals2. Importantly, both ver- sions represented the human genome as a haploid sequence and genetic variation was not annotated. In order to study how genetic variants contribute to phe- notypic diversity, large-scale studies were initiated to identify and catalogue nucleotides that differ among individuals. Initial studies focused largely on under- standing the range of patterns and frequencies of SNPs3���5. As the prevalence and contribution of structural variants to human biology was realized6,7, consortia were formed and systematic studies were conducted to improve our understanding of this class of variants8���10. In 2007, the first complete genome sequence of an individual, J. Craig Venter11, was published, followed shortly thereafter by the publication of a second individu- al���s genome, that of James D. Watson12. Subsequently, two additional genomes from anonymous individuals were sequenced: one Han Chinese (Asian)13 and one Nigerian (African)14. In aggregate, these studies ��� published after the release of the human genome reference sequence ��� have rapidly increased our knowledge of the various forms of human genetic variation, their evolutionary histories and the correlations between them. However, our understanding of the locations and frequencies of structural variants across the genome is still limited, and cataloguing these classes of alterations is a high priority. Genome-wide association (GWA) studies are the most widely used contemporary approach to relate genetic variation to phenotypic diversity 15. Over the past 2 years these studies have identified statistical associa- tion between hundreds of loci across the genome and common complex traits. The results of these studies have substantially increased our understanding of the diverse molecular pathways underlying specific human diseases. However, GWA studies have several limitations. First, there is great difficulty moving beyond mere statistical associations to identifying the functional basis of the link between a genomic interval and a given complex trait. Second, SNP associations identified in one popula- tion frequently are not transferable to members of other populations. Third, the bulk of the heritable fraction of complex traits has not been accounted for in recent GWA studies. This last point is probably explained by the fact that GWA studies do not capture information about rare variants and have limited statistical power to detect small gene���gene and gene���environment interactions. The use of new technologies for assaying DNA sequences has provided important insights and raised new questions about the roles that different types of genetic variants have in human health and disease. Here, for each type of genetic variant we discuss their probable contribution to overall genetic variation, the approaches taken to assess their contribution to phenotypic variation and the successes achieved so far. There have been sev- eral excellent reviews on structural variation16,17 as well as reviews describing the findings of GWA studies15,18���20. Scripps Genomic Medicine, Scripps Translational Science Institute and The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, California 92037, USA. Correspondence to K.A.F. e-mail: kfrazer@scripps.edu doi:10.1038/nrg2554 Structural variants Broadly defined, these are all variants that are not single nucleotide variants. They include insertion���deletions, block substitutions, inversions of DNA sequences and copy number differences. Genome-wide association (GWA) study An investigation of the association between common genetic variation and disease. This type of analysis requires a dense set of markers (for example, SNPs) that capture a substantial proportion of common variation across the genome, and large numbers of study subjects. Human genetic variation and its contribution to complex traits Kelly A. Frazer, Sarah S. Murray, Nicholas J. Schork and Eric J. Topol Abstract | The last few years have seen extensive efforts to catalogue human genetic variation and correlate it with phenotypic differences. Most common SNPs have now been assessed in genome-wide studies for statistical associations with many complex traits, including many important common diseases. Although these studies have provided new biological insights, only a limited amount of the heritable component of any complex trait has been identified and it remains a challenge to elucidate the functional link between associated variants and phenotypic traits. Technological advances, such as the ability to detect rare and structural variants, and a clear understanding of the challenges in linking different types of variation with phenotype, will be essential for future progress. REVIEWS NATurE rEVIEWS | Genetics VOlumE 10 | APrIl 2009 | 241 �� 2009 Macmillan Publishers Limited. All rights reserved
Nature Reviews | Genetics Structur al variants ATTGGCCTTAACCC---CCGATTATCAGGAT ATTGGCCTTAACCCCCGATTATCAGGAT ATTGGCCTTAACAGTGGATTATCAGGAT ATTGGCCTTAACCCCCGATTATCAGGAT ATTGGCCTTCGGGGGTTATTATCAGGAT ATTGGCCTTAGGCCTTAACCCCCGATTATCAGGAT ATTGGCCTTA-------ACCTCCGATTATCAGGAT ATTGGCCTTAACCCCCGATTATCAGGAT ATTGGCCTTAACCTCCGATTATCAGGAT ATTGGCCTTAACCCGATCCGATTATCAGGAT Single nucleotide variant Insertion���deletion variant Copy number variant Inversion variant Block substitution Figure 1 | classes of human genetic variants. The nomenclature used to describe the various types of structural variants is not yet standard121. Here, the terminology used aims to describe the nucleotide composition of the variant and distinguish it from other types of variants. Single nucleotide variants are DNA sequence variations in which a single nucleotide (A, T, G or C) is altered. Insertion���deletion variants (indels) occur when one or more base pairs are present in some genomes but absent in others. They are generally composed of only a few bases but can be greater than 80 kb in length11. Block substitutions describe cases in which a string of adjacent nucleotides varies between two genomes. An inversion variant is one in which the order of the base pairs is reversed in a defined section of a chromosome. A well-characterized inversion variant that has been described in humans involves a section of chromosome 17 in which a ~900 kb interval is in the reverse order in approximately 20% of individuals with Northern European ancestry122. Copy number variants occur when identical or nearly identical sequences are repeated in some chromosomes but not others. The largest copy number variant identified in the Venter genome11 was almost 2 Mb in length. Complex traits Continuously distributed phenotypes that are classically believed to result from the independent action of many genes, environmental factors and gene-by-environment interactions. Minor allele The less common allele of a polymorphism. Linkage disequilibrium (LD). In population genetics, LD is the nonrandom association of alleles. For example, alleles of SNPs that reside near one another on a chromosome often occur in nonrandom combinations owing to infrequent recombination. Here we unify the exciting discoveries of these two dis- ciplines into a single review to provide a comprehensive overview of our current knowledge of human genetic variation and where the key challenges lie for future research aimed at understanding the genetic architecture of complex traits. Classes of human genetic variation Human genetic variants are typically referred to as either common or rare, to denote the frequency of the minor allele in the human population. Common variants are synonymous with polymorphisms, defined as genetic variants with a minor allele frequency (mAF) of at least one percent in the population, whereas rare variants have a mAF of less than 1%. Genetic variants are also discussed in terms of their nucleotide composition. In the broadest sense, variants in the human genome can be divided into two different nucleotide composition classes: single nucleotide variants and structural variants10 (FIG. 1). The vast majority of genetic variants are hypothesized to be neutral21 (that is, they do not contribute to pheno- typic variation), achieving significant frequencies in the human population simply by chance. However, the rela- tive percentage of neutral, near-neutral22 and non-neutral variants remains to be empirically determined. Single nucleotide variants. SNPs are the most prevalent class of genetic variation among individuals. On the basis of survey sequencing results it has been estimated that the human genome contains at least 11 million SNPs, with ~7 million of these occurring with a mAF of over 5%23 and the remaining having mAFs between 1 and 5%. Analysis of the four fully sequenced individual genomes suggests that these original estimates are fairly accurate and that most SNPs have been identified and information about them deposited in the Single Nucleotide Polymorphism database (dbSNP) (BOX 1). In addition to SNPs there are innumerable rare and novel or ���de novo��� single nucleotide variants, in some cases segregating only in a nuclear family or a single individual. For instance, any base pair that, when altered, is compatible with life is likely to be found in at least one of the ~6.7 billion people on Earth. However, it is important to note that in any given indi- vidual the majority of variants are those that are com- mon in the population as a whole (BOX 1). Furthermore, when the genomes of two individuals are compared, the majority of the base pairs that differ are at positions with variants that are common in the population. The alleles of SNPs located in the same genomic inter- val are often correlated with one another. This correla- tion structure, or linkage disequilibrium (lD)24, varies in a complex and unpredictable manner across the genome and between different populations. The efforts of Phase I of the InternationalHapmapProject3, along with those of Perlegen Sciences5, paved the way for breaking the genome down into groups of highly correlated SNPs that are generally inherited together (known as lD bins). From Phase II of the International Hapmap Project4 it was determined that the vast majority of SNPs with a mAF of at least 5% could be reduced to ~550,000 lD bins for individuals of European or Asian ancestry and to 1,100,000 lD bins for individuals of African ancestry (r2 ��� 0.8). By genotyping the DNA sample of an individ- ual with a ���tagging��� SNP from each lD bin, knowledge regarding over 80% of SNPs present at a frequency above 5% across the genome is gained25���28. Structural variants. Structural variation, broadly defined, refers to all base pairs that differ between individuals and that are not single nucleotide variants. Such variation includes insertion���deletions (indels), block substitutions, inversions of DNA sequences and copy number differences (FIG. 1). Compared with single nucleotide variants, the technological ability to detect structural variants in the human genome has only recently emerged8,10,29���32. Hence our understanding of the locations and frequencies of structural variants, and our ability to assay their asso- ciation with complex traits, is still maturing33���38. Analysis of the four fully sequenced human genomes (BOX 1) com- bined with targeted sequencing of structural variants greater than 8 kb in length in eight human genomes9 has provided tremendous insight. These studies suggest that structural variation accounts for at least 20% of all genetic variants in humans and underlies greater than 70% of the variant bases. Altogether, for any given individual, structural variants constitute between 9 and 25 mb of the genome (~0.5 to 1%), underscoring the important roles of this class of variation in genome evolution and in human health and disease. LD patterns of common structural variants There has been conflicting initial evidence regarding whether the alleles of structural polymorphisms are in lD with SNPs, and are therefore assayed by proxy REVIEWS 242 | APrIl 2009 | VOlumE 10 www.nature.com/reviews/genetics �� 2009 Macmillan Publishers Limited. All rights reserved