Bayesian analysis of correlated e...
vol. 167, no. 6 the american naturalist june 2006 Bayesian Analysis of Correlated Evolution of Discrete Characters by Reversible-Jump Markov Chain Monte Carlo Mark Pagel* and Andrew Meade��� School of Biological Sciences, University of Reading, Reading RG6 6AJ, United Kingdom Submitted July 7, 2005 Accepted January 26, 2006 Electronically published May 9, 2006 abstract: We describe a Bayesian method for investigating cor- related evolution of discrete binary traits on phylogenetic trees. The method fits a continuous-time Markov model to a pair of traits, seeking the best fitting models that describe their joint evolution on a phylogeny. We employ the methodology of reversible-jump (RJ) Markov chain Monte Carlo to search among the large number of possible models, some of which conform to independent evolution of the two traits, others to correlated evolution. The RJ Markov chain visits these models in proportion to their posterior probabilities, thereby directly estimating the support for the hypothesis of corre- lated evolution. In addition, the RJ Markov chain simultaneously estimates the posterior distributions of the rate parameters of the model of trait evolution. These posterior distributions can be used to test among alternative evolutionary scenarios to explain the ob- served data. All results are integrated over a sample of phylogenetic trees to account for phylogenetic uncertainty. We implement the method in a program called RJ Discrete and illustrate it by analyzing the question of whether mating system and advertisement of estrus by females have coevolved in the Old World monkeys and great apes. Keywords: correlated evolution, Bayesian, MCMC, Discrete, reversible- jump, comparative methods. Comparative biologists may frequently wish to test hy- potheses about whether two traits have coevolved through- out their evolutionary history. This is one of evolutionary biology���s most enduring methods for investigating adap- tation and evolution. A correlation between two traits may suggest that both are responding to some common evo- * E-mail: m.pagel@rdg.ac.uk. ��� E-mail: a.meade@rdg.ac.uk. Am. Nat. 2006. Vol. 167, pp. 808���825. 2006 by The University of Chicago. 0003-0147/2006/16706-41177$15.00. All rights reserved. lutionary force or that one acts as a selective force for change in the other. Hypotheses about correlated evolution may involve traits that naturally vary along a continuous dimension, such as body size or wing length, or that adopt only a finite number of discrete states, such as mating system or the presence or absence of some feature. Our interest in this article is to investigate correlated evolution of discrete traits. Pagel (1994a) described a continuous-time Markov model approach that calculates the likelihood of discrete trait data under two models of evolution, one in which the traits are allowed to evolve independently of one another on a phylogenetic tree and one in which they evolve in a correlated or dependent fashion. If the likelihood of observing the trait data under the model of dependent trait evolution exceeds that for the model of independent trait evolution, the null hy- pothesis of no correlation is rejected in favor of the view that the traits have coevolved. In its original formulation, this model, known as Dis- crete, was estimated by maximum likelihood methods, and the test of correlated evolution was typically performed on a single phylogenetic tree. This practice, routine in com- parative studies, implicitly assumes that the investigator has used the true phylogeny. But phylogenies are inferences from data, subject to error, and different estimates of the phylogenetic tree can give different answers to comparative hypothesis tests. Attempts to resolve the problem of phy- logenetic uncertainty include testing hypotheses in hap- hazard samples of ���best��� or favored trees, using all equally parsimonious trees, and using a consensus tree. These ap- proaches, although a step in the right direction, are of uncertain value because they lack a clear probabilistic basis for drawing a sample of trees on which to test the com- parative hypothesis. Bayesian Markov chain Monte Carlo (MCMC) statis- tical methods (e.g., Geyer 1992 Gilks et al. 1996) offer a solution to the problems of sampling phylogenies and pro- vide a way to account for phylogenetic uncertainty in com- parative studies. These methods can be used to draw a
Reversible-Jump Discrete 809 Figure 1: Transitions among the four combinations of states resulting from two binary variables. Subscripts i and j identify the beginning and ending states, respectively, of each transition, where the values 1, 2, 3, and 4 correspond to the state pairs {0,0}, {0,1}, {1,0}, and {1,1}. Thus, q12 describes the transition between state {0,0} and state {0,1} over short time interval dt. Dual transitions in which the states of both variables instantaneously change are assumed not to occur. sample of trees that estimates the posterior probability distribution of phylogenies (e.g., Larget and Simon 1999). The comparative hypothesis is then tested by summing over trees, effectively integrating over the tree space, re- moving phylogenetic uncertainty from the result (Pagel and Lutzoni 2002). We have implemented the Discrete method in a Bayesian framework in which MCMC meth- ods are used to estimate ancestral character states (Pagel et al. 2004) and to test for correlated evolution (Pagel and Meade 2005). In addition to accounting for phylogenetic uncertainty, the Bayesian approach simultaneously esti- mates the posterior probability distributions of the param- eters of the model of trait evolution. Our aim here is to show how a particular form of Bayesian analysis, known as reversible-jump (RJ) MCMC (Green 1995), can additionally be used to guide the choice of the best-fitting model of trait evolution within Discrete, while simultaneously testing the hypothesis of a correlation between two traits. The significance of the RJ MCMC ap- proach in this context is that there are more than 21,000 distinct models of correlated trait evolution to choose from, varying in the number of parameters used to explain the data. Unlike with most modern phylogenetic inference, in which the researcher typically has many hundreds or thousands of data points or sites in the alignment, the comparative biologist is frequently limited to just one or two traits to estimate the parameters of the model of trait evolution. This places a premium on choosing the best model to describe the data���one that balances numbers of parameters against being able to detect the patterns of coevolution in the two traits under study. Conventional means of selecting the best model require the investigator to try multiple tests among alternative pairs that vary in the number of parameters and the con- straints put on them. The RJ MCMC methodology cir- cumvents this by constructing a Markov chain that ex- plores the universe of possible models, visiting them in direct proportion to their posterior probabilities. The model or models that emerge from the chain will normally have direct relevance to testing among alternative evolu- tionary scenarios to describe how the traits arose and coevolved. To illustrate the RJ MCMC methodology, we revisit the question of whether the advertisement of estrus by female Old World monkeys tends to occur in primate societies with multimale mating systems in which females mate with more than one male. Several previous investigators have suggested this link (Clutton-Brock and Harvey 1976 Hrdy and Whitten 1987 Hrdy 1988 Sille ��n-Tullberg and M��ller 1993 Pagel 1994b Domb and Pagel 2001). Domb and Pagel (2001) showed that females with the most pro- nounced estrus advertisement have higher expected life- time reproductive success, providing a reason for males to be interested. We show how the parameters of evolutionary models that bear on alternative hypotheses to explain this relationship can be estimated on a molecular phylogeny of the Old World monkeys. In the sections to follow, we first describe the model of correlated evolution and its parameters. This is followed by a discussion of the number of distinct forms the model can take. We then describe MCMC methods and the RJ methodology before applying the methods to the primate data. The Model of Correlated Trait Evolution Our description here of the Discrete model only partly overlaps with that of Pagel (1994a). Two binary traits can yield four different combinations of states, as shown in figure 1. These states are observed in species and arrayed on a phylogeny. The model attempts to discover the evo- lutionary pathways that held throughout the evolutionary history of the species and eventually gave rise to the ob- served data. Figure 1 links the four combinations of states by arrows with parameters that describe the evolutionary rates of transition between the two states of one variable, holding constant the state of the other. The subscripts i and j identify the beginning and ending states, respectively, of a particular transition, where the values 1, 2, 3, and 4 correspond to the state pairs {0,0}, {0,1}, {1,0}, and {1,1}. If two traits have evolved independently of one another, then the rate of change between the two states of one variable will not depend on the background state of the other. For example, if the rate of change from state 0 to state 1 in the second variable does not depend on the state of the first variable, then q12 will be equal to q34. If 0 and
810 The American Naturalist 1 refer to the absence and presence, respectively, of some trait, then implies that gains of trait 2 are in- q12 p q 34 dependent of the presence or absence of trait 1. More generally, the model of independent evolution can be de- fined by setting , , and q12 p q q13 p q q p q43, 34 24 21 it therefore uses a maximum of four parameters q p q42 31 and as few as one if all four pairs adopt the same value. The model of correlated or dependent evolution does not place any restrictions on the parameters, instead al- lowing some kinds of transitions to depend on the back- ground state of the other. When this is true, pairs of states will tend to be associated with each other in the species data more often than expected by chance, and the depen- dent model will provide a better description of the data. Thus, if the pair of states {1,1} is observed more often than {0,1}, it may suggest that , and the depen- q12 ( q 34 dent model will account for the data better than the in- dependent model. Many other dependent models are pos- sible (see Pagel 1994a for some general cases) that require as few as two transition rates and as many as eight. The method is formally described by a rate matrix Q: 0,0 0,1 1,0 1,1 0,0 ��� q q13 0 12 ��� ��� 0,1 q ��� 0 q 21 24 Q p , (1) I,D 1,0 q31 0 ��� q34��� ��� 1,1 0 q q ������ ��� 42 43 where we use the QI,D notation to indicate that the matrix can be configured to either the independent or the de- pendent (correlated) model, depending on whether some pairs of transition rates are equivalent. The main diagonal elements are defined as minus the sum of the other rate coefficients in the row of the matrix, so that each row sums to zero. The values of all dual transitions, or cases in which both traits change simultaneously, are set to zero. Dual transitions are set to zero because their transition rate parameters measure the probability of both traits changing simultaneously in an infinitesimally short inter- val dt. The probability of both traits changing in the same instant is negligibly small and can be ignored. The model does allow both traits to change over a longer interval t, however. Numbering the four states in the ma- trix above 1���4, as before, we can write the probability of a change in short interval dt as The prob- Pij (dt) p q dt. ij abilities over longer intervals t are found by exponentiating the matrix multiplied by the length of the interval: Q P11 (t) P12 (t) P13 (t) P14(t) ��� ��� P21 (t) P22 (t) P23 (t) P24(t) Qt P(t) p e p . (2) P31 (t) P32 (t) P33 (t) P34(t)��� ��� P41 (t) P42 (t) P43 (t) P44(t)��� ��� Combining these probabilities over all branches of the tree and all possible states at each node (see Pagel 1994a) yields the likelihood of the data given the model of trait evo- lution, or P(DFQ). The Number of Different Models Writing the rate parameters from equation (1) in pairs as described for the model of independent evolution gives the string ({q12,q34},{q13,q24},{q21,q43},{q31,q42}). We can as- sign integers to these elements so that two elements with the same integer adopt the same rate���they fall in the same rate class. Independent evolution of two traits pre- dicts that the members of each pair in this string will be in the same rate class. The model described by (1,1,2,2,3,3,4,4) is the most general model of independent evolution. Even though rates vary, the relevant pairs always evolve at the same rate. The model (1,1,1,1,1,1,1,1) sets all of the rates equal to each other. It is also a model of independent evolution because for all relevant pairs, the rate at which one trait changes does not depend on the value of the other trait. Using the same pairwise classification, the model (1,2,3,4,5,6,7,8) describes the most general model of trait evolution, using all eight of the rate parameters. It is a model of correlated evolution because the rate of change of one trait depends on the value of the second trait: the members of each pair are evolving at different rates. Many other models of correlated evolution can be constructed using fewer than eight parameters. How many models are there? Stirling numbers of the second kind count the number of distinct ways of ar- ranging n objects into c classes. Denoting the Stirling num- bers as S2(n,c), we can compute them for any n and c from c 1 1 c) i S (n,c) p ( 1) (c i)n. 2 ( i c! ip0 If n is 3 and c is 1, there is just one outcome: all three objects are in the same class. For and there n p 3 c p 2, are three possible outcomes: ((1,2),3), ((1,3),2), and ((2,3),1), each one placing a different pair in the same class. For and there is again just one out- n p 3 c p 3, come: each object in a different class. A common Stirling number is the number of ways of 1) S (n,2) p 2(n 1, 2 dividing n objects into two classes. The Stirling numbers for eight objects are S (8,1) p 1, 2 S (8,2) p 127, S (8,3) p 966, S (8,4) p 1,701, S (8,5) p 2 2 2 2 and giv- 1,050, S (8,6) p 266, S (8,7) p 28, S (8,8) p 1, 2 2 2 ing a total of 4,140 distinct models for the full eight- parameter model of equation (1). This sum is referred to as the Bell number for eight objects. The last of these, S2(8,8), is the model (1,2,3,4,5,6,7,8). One of the 28 models