Sign up & Download
Sign in

Linkage disequilibrium--understanding the evolutionary past and mapping the medical future.

by Montgomery Slatkin
Nature Reviews Genetics ()

Abstract

Linkage disequilibrium-the nonrandom association of alleles at different loci-is a sensitive indicator of the population genetic forces that structure a genome. Because of the explosive growth of methods for assessing genetic variation at a fine scale, evolutionary biologists and human geneticists are increasingly exploiting linkage disequilibrium in order to understand past evolutionary and demographic events, to map genes that are associated with quantitative characters and inherited diseases, and to understand the joint evolution of linked sets of genes. This article introduces linkage disequilibrium, reviews the population genetic processes that affect it and describes some of its uses. At present, linkage disequilibrium is used much more extensively in the study of humans than in non-humans, but that is changing as technological advances make extensive genomic studies feasible in other species.

Cite this document (BETA)

Available from Nature Reviews Genetics
Page 1
hidden

Linkage disequilibrium--understan...

Linkage disequilibrium (LD) is one of those unfortu- nate terms that does not reveal its meaning. As every instructor of population genetics knows, the term is a barrier not an aid to understanding. LD means simply a nonrandom association of alleles at two or more loci, and detecting LD does not ensure either linkage or a lack of equilibrium. The term was first used in 1960 by Lewontin and Kojima1 and it persists because LD was initially the concern of population geneticists who were not picky about terminology as long as the mathematical definition was clear. At first, there were few data with which to study LD, and its importance to evolutionary biology and human genetics was unrecognized outside of population genetics. However, interest in LD grew rapidly in the 1980s once the usefulness of LD for gene mapping became evident and large-scale surveys of closely linked loci became feasible. By then, the term was too well established to be replaced. LD is of importance in evolutionary biology and human genetics because so many factors affect it and are affected by it. LD provides information about past events and it constrains the potential response to both natu- ral and artificial selection. LD throughout the genome reflects the population history, the breeding system and the pattern of geographic subdivision, whereas LD in each genomic region reflects the history of natural selection, gene conversion, mutation and other forces that cause gene-frequency evolution. How these fac- tors affect LD between a particular pair of loci or in a genomic region depends on local recombination rates. The population genetics theory of LD is well developed and is being widely used to provide insight into evolu- tionary history and as the basis for mapping genes in humans and in other species. In this article, I will review the definitions of LD and the problems with assessing it, then outline the basic population genetics of LD that tells us how natural selection, genetic drift, recombination and mutation all affect levels of LD, and finally discuss some recent appli- cations of LD to mapping genes, inferring the intensity of selection in the genome and estimating allele age. Definitions One pair of loci. LD between alleles at two loci has been defined in many ways (BOX��1), but all definitions depend on the quantity: DAB = pAB ��� pApB (1) which is the difference between the frequency of gametes carrying the pair of alleles A and B at two loci (pAB) and the product of the frequencies of those alleles (pA and pB). Originally, the definition was in terms of gamete fre- quencies because that allows for the possibility that the Department of Integrative Biology, University of California, Berkeley, California 94720���3140, USA. e���mail: slatkin@berkeley.edu doi:10.1038/nrg2361 Published��online���� 22��April��2008 Linkage disequilibrium ��� understanding the evolutionary past and mapping the medical future Montgomery Slatkin Abstract | Linkage disequilibrium ��� the nonrandom association of alleles at different loci ��� is a sensitive indicator of the population genetic forces that structure a genome. Because of the explosive growth of methods for assessing genetic variation at a fine scale, evolutionary biologists and human geneticists are increasingly exploiting linkage disequilibrium in order to understand past evolutionary and demographic events, to map genes that are associated with quantitative characters and inherited diseases, and to understand the joint evolution of linked sets of genes. This article introduces linkage disequilibrium, reviews the population genetic processes that affect it and describes some of its uses. At present, linkage disequilibrium is used much more extensively in the study of humans than in non-humans, but that is changing as technological advances make extensive genomic studies feasible in other species. FunDamental concepts in genetics REVIEWS nATure revIews | genetics vOLume 9 | June 2008 | 477 �� 2008 Nature Publishing Group
Page 2
hidden
loci are on different chromosomes. The usual application now is to loci on the same chromosome, in which case the allele pair AB is called a haplotype and pAB is the haplotype frequency. As defined, DAB characterizes a population in practice, DAB is estimated from allele and haplotype frequencies in a sample. standard sampling theory has to be applied to find the confidence intervals of estimated values2. The quantity DAB is the coefficient of linkage disequi- librium. It is defined for a specific pair of alleles, A and B, and does not depend on how many other alleles are at the two loci ��� each pair of alleles has its own D. The values for different pairs of alleles are constrained by the fact that the allele frequencies at both loci and the haplotype frequencies have to add up to 1. If both loci are diallelic, as is the case with virtually all snPs, the constraint is strong enough that only one value of D is needed to characterize LD between those loci. In fact, DAB = ���DAb = ���DaB = Dab, where a and b are the other alleles. In this case, the D is used without a subscript. The sign of D is arbitrary and depends on which pair of alleles one starts with. If either locus has more than two alleles, no single sta- tistic quantifies the overall LD between them. Although several have been suggested3,4, none has gained wide acceptance. such a statistic is needed when both loci have numerous alleles, as is the case for many loci in the major histocompatibility complex in vertebrates, which have dozens or even hundreds of alleles, or for micros- atellite loci, which often have 10 to 20 alleles. If there is no one pair of alleles of particular interest, the question is often whether there is more LD between one pair of loci than another pair, or more LD between a pair of loci in one species than in another5,6. Linkage equilibrium. If D = 0 there is linkage equilibrium (Le), which has similarities to the Hardy���weinberg equilibrium (Hwe) in implying statistical independ- ence. when genotypes at a single locus are at Hwe, whether an allele is present on one chromosome is independent of whether it is present on the homologue. Consequently, the frequency of the AA homozygote is the square of the frequency of A (pAA = pA) 2 and the fre- quency of the Aa heterozygote is twice the product of pA and pa, the two being necessary to allow for both Aa and aA. The essential feature of Hwe is that, regardless of the initial genotype frequencies, Hwe is established in one generation of random mating. Any initial deviation from Hwe disappears immediately. significant depar- tures from Hwe indicate something interesting is going on, for example, extensive inbreeding, strong selection or genotyping error. Le is similar to Hwe because it implies that alleles at different loci are randomly associated. The frequency of the AB haplotype is the product of the allele frequencies (pApB). Le differs from Hwe, however, because it is not established in one generation of random mating. Instead, D decreases at a rate that depends on the recombination frequency, c, between the two loci: DAB (t + 1) = (1 ��� c) DAB (t) (2) where t is time in generations. even for unlinked loci (c = 0.5), D decreases only by a factor of a half each gen- eration, something proved by weinberg7 in 1909. The general formula was obtained first by Jennings8. Although Le will eventually be reached, it will occur slowly for closely linked loci. That is the basis for the uses of LD discussed in later sections. Other population genetic forces, including selection, gene flow, genetic drift and mutation, all affect D, so substantial LD will persist under many conditions. now that very large numbers of polymorphic loci can be surveyed, the extent of LD in a genome can be quantified with great preci- sion, allowing a fine-scale analysis of forces governing genomic variation. The coefficient of LD and related quantities are descriptive statistics. Their magnitude does not indicate whether or not there is a statistically significant associa- tion between alleles in haplotypes. standard statistical tests, including the chi squared and Fisher���s exact test, are commonly used to test for significance2. Haplotype phase D and related statistics implicitly assume that haploid individuals or gametes can be typed. But often, only diploid genotypes and not haplotypes can be deter- mined. That is the case with all snP surveys, other than those of the X chromosome in males (assuming males are the heterogametic sex) or when haploids can be typed. The problem is sketched in BOX��2. The extent of LD in genotypic data2,9 can be quantified, but the lack of information about the haplotype phase weakens the signal of nonrandom association sufficiently that this approach is not often taken. It is more common to use a statistical method based on population genetics theory (BOX��2) to infer haplotype phase from genotypic data and then to treat the inferred haplotypes as if they were data. Box 1 | Definitions of lD Different definitions of linkage disequilibrium (LD) have been proposed because they capture different features of nonrandom association. All of them are related to D, which is defined in equation 1 in the text. Although D completely characterizes the extent to which two alleles, A and B, are nonrandomly associated, it is often not the best statistic to use when comparing LD at different pairs of loci because the range of possible values of D for each pair is constrained by the allele frequencies. The smallest possible value, D min , is the less negative value of ���p A p B and ���(1 ��� p A )(1 ��� p B ), where p is the frequency of the allele. The largest possible value, D max , is the smaller of p A (1 ��� p B ) and p B (1 ��� p A ). Lewontin119 defined D��� to be the ratio of D to its maximum possible absolute value, given the allele frequencies. This definition has the convenient property that when D��� = 1 it indicates that at least one of the four possible haplotypes is absent, regardless of the allele frequencies (BOX��2), a situation commonly described as a ���perfect��� disequilibrium. Another commonly used way to quantify LD is with r2: r2 = D2 pA (1 ��� pA)pB(1 ��� pB) which is a correlation coefficient of 1/0 (all or none) indicator variables indicating the presence of A and B. In general, r2 is similar to D��� in that it can be nearly one even if one or both alleles are in low frequency. Still another measure is �� A , defined to be p A + D/p B . It is the conditional probability that a chromosome carries an A allele, given that it carries a B allele. It is useful for characterizing the extent to which a particular allele is associated with a genetic diease120. REVIEWS 478 | June 2008 | vOLume 9 www.nature.com/reviews/genetics �� 2008 Nature Publishing Group

Readership Statistics

187 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
34% Ph.D. Student
 
18% Post Doc
 
10% Researcher (at an Academic Institution)
by Country
 
21% United States
 
13% United Kingdom
 
10% Brazil

Tags

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in