Bayesian analysis of correlated evolution of discrete characters by reversible-jump Markov chain Monte Carlo

by ,
The American naturalist ()
or

Abstract

We describe a Bayesian method for investigating correlated evolution of discrete binary traits on phylogenetic trees. The method fits a continuous-time Markov model to a pair of traits, seeking the best fitting models that describe their joint evolution on a phylogeny. We employ the methodology of reversible-jump RJ) Markov chain Monte Carlo to search among the large number of possible models, some of which conform to independent evolution of the two traits, others to correlated evolution. The RJ Markov chain visits these models in proportion to their posterior probabilities, thereby directly estimating the support for the hypothesis of correlated evolution. In addition, the RJ Markov chain simultaneously estimates the posterior distributions of the rate parameters of the model of trait evolution. These posterior distributions can be used to test among alternative evolutionary scenarios to explain the observed data. All results are integrated over a sample of phylogenetic trees to account for phylogenetic uncertainty. We implement the method in a program called RJ Discrete and illustrate it by analyzing the question of whether mating system and advertisement of estrus by females have coevolved in the Old World monkeys and great apes.

Page 1

Bayesian analysis of correlated e...

Page 2
Reversible-Jump Discrete 809 Figure 1: Transitions among the four combinations of states resulting from two binary variables. Subscripts i and j identify the beginning and ending states, respectively, of each transition, where the values 1, 2, 3, and 4 correspond to the state pairs {0,0}, {0,1}, {1,0}, and {1,1}. Thus, q12 describes the transition between state {0,0} and state {0,1} over short time interval dt. Dual transitions in which the states of both variables instantaneously change are assumed not to occur. sample of trees that estimates the posterior probability distribution of phylogenies (e.g., Larget and Simon 1999). The comparative hypothesis is then tested by summing over trees, effectively integrating over the tree space, re- moving phylogenetic uncertainty from the result (Pagel and Lutzoni 2002). We have implemented the Discrete method in a Bayesian framework in which MCMC meth- ods are used to estimate ancestral character states (Pagel et al. 2004) and to test for correlated evolution (Pagel and Meade 2005). In addition to accounting for phylogenetic uncertainty, the Bayesian approach simultaneously esti- mates the posterior probability distributions of the param- eters of the model of trait evolution. Our aim here is to show how a particular form of Bayesian analysis, known as reversible-jump (RJ) MCMC (Green 1995), can additionally be used to guide the choice of the best-fitting model of trait evolution within Discrete, while simultaneously testing the hypothesis of a correlation between two traits. The significance of the RJ MCMC ap- proach in this context is that there are more than 21,000 distinct models of correlated trait evolution to choose from, varying in the number of parameters used to explain the data. Unlike with most modern phylogenetic inference, in which the researcher typically has many hundreds or thousands of data points or sites in the alignment, the comparative biologist is frequently limited to just one or two traits to estimate the parameters of the model of trait evolution. This places a premium on choosing the best model to describe the data���one that balances numbers of parameters against being able to detect the patterns of coevolution in the two traits under study. Conventional means of selecting the best model require the investigator to try multiple tests among alternative pairs that vary in the number of parameters and the con- straints put on them. The RJ MCMC methodology cir- cumvents this by constructing a Markov chain that ex- plores the universe of possible models, visiting them in direct proportion to their posterior probabilities. The model or models that emerge from the chain will normally have direct relevance to testing among alternative evolu- tionary scenarios to describe how the traits arose and coevolved. To illustrate the RJ MCMC methodology, we revisit the question of whether the advertisement of estrus by female Old World monkeys tends to occur in primate societies with multimale mating systems in which females mate with more than one male. Several previous investigators have suggested this link (Clutton-Brock and Harvey 1976 Hrdy and Whitten 1987 Hrdy 1988 Sille ��n-Tullberg and M��ller 1993 Pagel 1994b Domb and Pagel 2001). Domb and Pagel (2001) showed that females with the most pro- nounced estrus advertisement have higher expected life- time reproductive success, providing a reason for males to be interested. We show how the parameters of evolutionary models that bear on alternative hypotheses to explain this relationship can be estimated on a molecular phylogeny of the Old World monkeys. In the sections to follow, we first describe the model of correlated evolution and its parameters. This is followed by a discussion of the number of distinct forms the model can take. We then describe MCMC methods and the RJ methodology before applying the methods to the primate data. The Model of Correlated Trait Evolution Our description here of the Discrete model only partly overlaps with that of Pagel (1994a). Two binary traits can yield four different combinations of states, as shown in figure 1. These states are observed in species and arrayed on a phylogeny. The model attempts to discover the evo- lutionary pathways that held throughout the evolutionary history of the species and eventually gave rise to the ob- served data. Figure 1 links the four combinations of states by arrows with parameters that describe the evolutionary rates of transition between the two states of one variable, holding constant the state of the other. The subscripts i and j identify the beginning and ending states, respectively, of a particular transition, where the values 1, 2, 3, and 4 correspond to the state pairs {0,0}, {0,1}, {1,0}, and {1,1}. If two traits have evolved independently of one another, then the rate of change between the two states of one variable will not depend on the background state of the other. For example, if the rate of change from state 0 to state 1 in the second variable does not depend on the state of the first variable, then q12 will be equal to q34. If 0 and
Page 3
810 The American Naturalist 1 refer to the absence and presence, respectively, of some trait, then implies that gains of trait 2 are in- q12 p q 34 dependent of the presence or absence of trait 1. More generally, the model of independent evolution can be de- fined by setting , , and q12 p q q13 p q q p q43, 34 24 21 it therefore uses a maximum of four parameters q p q42 31 and as few as one if all four pairs adopt the same value. The model of correlated or dependent evolution does not place any restrictions on the parameters, instead al- lowing some kinds of transitions to depend on the back- ground state of the other. When this is true, pairs of states will tend to be associated with each other in the species data more often than expected by chance, and the depen- dent model will provide a better description of the data. Thus, if the pair of states {1,1} is observed more often than {0,1}, it may suggest that , and the depen- q12 ( q 34 dent model will account for the data better than the in- dependent model. Many other dependent models are pos- sible (see Pagel 1994a for some general cases) that require as few as two transition rates and as many as eight. The method is formally described by a rate matrix Q: 0,0 0,1 1,0 1,1 0,0 ��� q q13 0 12 ��� ��� 0,1 q ��� 0 q 21 24 Q p , (1) I,D 1,0 q31 0 ��� q34��� ��� 1,1 0 q q ������ ��� 42 43 where we use the QI,D notation to indicate that the matrix can be configured to either the independent or the de- pendent (correlated) model, depending on whether some pairs of transition rates are equivalent. The main diagonal elements are defined as minus the sum of the other rate coefficients in the row of the matrix, so that each row sums to zero. The values of all dual transitions, or cases in which both traits change simultaneously, are set to zero. Dual transitions are set to zero because their transition rate parameters measure the probability of both traits changing simultaneously in an infinitesimally short inter- val dt. The probability of both traits changing in the same instant is negligibly small and can be ignored. The model does allow both traits to change over a longer interval t, however. Numbering the four states in the ma- trix above 1���4, as before, we can write the probability of a change in short interval dt as The prob- Pij (dt) p q dt. ij abilities over longer intervals t are found by exponentiating the matrix multiplied by the length of the interval: Q P11 (t) P12 (t) P13 (t) P14(t) ��� ��� P21 (t) P22 (t) P23 (t) P24(t) Qt P(t) p e p . (2) P31 (t) P32 (t) P33 (t) P34(t)��� ��� P41 (t) P42 (t) P43 (t) P44(t)��� ��� Combining these probabilities over all branches of the tree and all possible states at each node (see Pagel 1994a) yields the likelihood of the data given the model of trait evo- lution, or P(DFQ). The Number of Different Models Writing the rate parameters from equation (1) in pairs as described for the model of independent evolution gives the string ({q12,q34},{q13,q24},{q21,q43},{q31,q42}). We can as- sign integers to these elements so that two elements with the same integer adopt the same rate���they fall in the same rate class. Independent evolution of two traits pre- dicts that the members of each pair in this string will be in the same rate class. The model described by (1,1,2,2,3,3,4,4) is the most general model of independent evolution. Even though rates vary, the relevant pairs always evolve at the same rate. The model (1,1,1,1,1,1,1,1) sets all of the rates equal to each other. It is also a model of independent evolution because for all relevant pairs, the rate at which one trait changes does not depend on the value of the other trait. Using the same pairwise classification, the model (1,2,3,4,5,6,7,8) describes the most general model of trait evolution, using all eight of the rate parameters. It is a model of correlated evolution because the rate of change of one trait depends on the value of the second trait: the members of each pair are evolving at different rates. Many other models of correlated evolution can be constructed using fewer than eight parameters. How many models are there? Stirling numbers of the second kind count the number of distinct ways of ar- ranging n objects into c classes. Denoting the Stirling num- bers as S2(n,c), we can compute them for any n and c from c 1 1 c) i S (n,c) p ( 1) (c i)n. 2 ( i c! ip0 If n is 3 and c is 1, there is just one outcome: all three objects are in the same class. For and there n p 3 c p 2, are three possible outcomes: ((1,2),3), ((1,3),2), and ((2,3),1), each one placing a different pair in the same class. For and there is again just one out- n p 3 c p 3, come: each object in a different class. A common Stirling number is the number of ways of 1) S (n,2) p 2(n 1, 2 dividing n objects into two classes. The Stirling numbers for eight objects are S (8,1) p 1, 2 S (8,2) p 127, S (8,3) p 966, S (8,4) p 1,701, S (8,5) p 2 2 2 2 and giv- 1,050, S (8,6) p 266, S (8,7) p 28, S (8,8) p 1, 2 2 2 ing a total of 4,140 distinct models for the full eight- parameter model of equation (1). This sum is referred to as the Bell number for eight objects. The last of these, S2(8,8), is the model (1,2,3,4,5,6,7,8). One of the 28 models

by Discipline

34% Ph.D. Student

23% Post Doc

11% Researcher (at an Academic Institution)
by Country

23% United States

9% United Kingdom

9% Spain