Genome evolution and adaptation i...
ARTICLES Genome evolution and adaptation in a long-term experiment with Escherichia coli Jeffrey E. Barrick1*, Dong Su Yu2,3*, Sung Ho Yoon2, Haeyoung Jeong2, Tae Kwang Oh2,4, Dominique Schneider5, Richard E. Lenski1 & Jihyun F. Kim2,6 The relationship between rates of genomic evolution and organismal adaptation remains uncertain, despite considerable interest. The feasibility of obtaining genome sequences from experimentally evolving populations offers the opportunity to investigate this relationship with new precision. Here we sequence genomes sampled through 40,000 generations from a laboratory population of Escherichia coli. Although adaptation decelerated sharply, genomic evolution was nearly constant for 20,000 generations. Such clock-like regularity is usually viewed as the signature of neutral evolution, but several lines of evidence indicate that almost all of these mutations were beneficial. This same population later evolved an elevated mutation rate and accumulated hundreds of additional mutations dominated by a neutral signature. Thus, the coupling between genomic and adaptive evolution is complex and can be counterintuitive even in a constant environment. In particular, beneficial substitutions were surprisingly uniform over time, whereas neutral substitutions were highly variable. Adaptation has often been viewed as a gradual process. Darwin1 wrote that ������We see nothing of these slow changes in progress, until the hand of time has marked the long lapse of ages���������. Theoretical work in quantitative genetics supported this view by showing that gradual adaptation would result from constant selection on many mutations of small effect2. However, an alternative model of evolu- tion on rugged fitness landscapes challenged this perspective3 and, later, empirical evidence was found for alternating periods of rapid phenotypic evolution and stasis in some lineages4,5. The causes of variation in the rate of adaptation remain controversial and are probably diverse. They may include changes in the environment, in circumstances promoting or impeding gene flow, and in opportunities for refinement following the origin of key innovations or the invasion of new habitats, among other factors6���11. Genomicchangesunderlieevolutionaryadaptation,butmutations��� even those substituted (fixed) in evolving populations���are not neces- sarilybeneficial.Variationintherateofgenomicevolutionisalsosubject tomanyinfluencesandcomplications.Ontheonehand,theorypredicts that neutral mutations should accumulate by drift at a uniform rate, albeit stochastically, provided the mutation rate is constant12. On the other hand, rates of substitution of beneficial and deleterious mutations depend on selection, and hence the environment, as well as on popu- lation size and structure13,14. Moreover, the relative proportions of sub- stitutions that are neutral, deleterious and beneficial are usually difficult to infer given imperfect knowledge of any organism���s genetics and eco- logy, in the past as well as in the present. Experiments with tractable model organisms evolving in con- trolled laboratory environments minimize many of these complica- tions and uncertainties15,16. Moreover, new methods have made it feasible to sequence complete genomes from evolution experiments with bacteria17���20. To date, such analyses have focused on finding the mutations responsible for particular adaptations. However, the application of comparative genome sequencing to experimental evolution studies also offers the opportunity to address major con- ceptual issues, including whether the dynamics of genomic and adaptive evolution are coupled very tightly or only loosely10,12,13,21,22. Genome dynamics and adaptation To examine the tempo and mode of genomic evolution, we sequenced the genomes of E. coli clones sampled at generations 2,000, 5,000, 10,000, 15,000, 20,000 and 40,000 from an asexual population that evolved with glucose as a limiting nutrient for almost 20 years as part of a long-term experiment. The complete sequence of the ancestral strain served as a reference for identifying mutations in the evolved clones, which we refer to by their generation abbrevia- tions 2K, 5K, 10K, 15K, 20K and 40K. Figure 1 shows all mutations identified in the evolved clones through 20,000 generations. The 45 mutations in the 20K clone include 29 single-nucleotide polymorphisms (SNPs) and 16 dele- tions, insertions and other polymorphisms (DIPs). Figure 2 shows that the number of mutational differences between the ancestral and evolved genomes accumulated in a near-linear fashion over this per- iod. Any deviation from linearity was not statistically significant based on randomization tests. The near-linearity of the trajectory for genomic evolution is rather surprising, given that such constancy is widely taken as a signature of neutral evolution12, whereas the fitness trajectory for this popu- lation23 shows profound adaptation that is strongly nonlinear. In particular, the rate of fitness improvement decelerates over time (Fig. 2), which indicates that the rate of appearance of new beneficial mutations is declining, their average benefit is becoming smaller, or both. These effects, in turn, should cause the rate of genomic evolu- tion to decelerate. To understand this point, consider a simple model of the substi- tution of beneficial mutations in a clonal population of haploid organisms. A beneficial mutation has an initial frequency of 1/N, *These authors contributed equally to this work. 1 Department of Microbiology and Molecular Genetics, Michigan State University, East Lansing, Michigan 48824, USA. 2 Industrial Biotechnology and Bioenergy Research Center, Korea Research Institute of Bioscience and Biotechnology, Yuseong, Daejeon 305-806, Korea. 3 Department of Computer Science and Engineering, Chungnam National University, Yuseong, Daejeon 305-764, Korea. 421C Frontier Microbial Genomics and Applications Center, Yuseong, Daejeon 305-806, Korea. 5Institut Jean Roget, Laboratoire Adaptation et Pathogenie �� des Microorganismes, CNRS UMR 5163, Universite �� Joseph Fourier, Grenoble 1, BP 170, F-38042 Grenoble cedex 9, France. 6Functional Genomics Program, School of Science, University of Science and Technology, Yuseong, Daejeon 305-333, Korea. Vol 461|29 October 2009|doi:10.1038/nature08480 1243 Macmillan Publishers Limited. All rights reserved ��2009
of both adaptive and genomic evolution or, alternatively, no decel- eration in either trajectory. Predominance of beneficial substitutions The simplest hypothesis that could explain the discrepancy between the nearly constant rate of genomic change and the sharply decel- erating fitness trajectory posits that only a small fraction of all sub- stitutions are beneficial, whereas most are neutral or nearly so12,14. Accordingly, the beneficial substitutions would be concentrated in the early phase of rapid adaptation to the conditions of the experi- ment, but over time that initial burst would be swamped by the constant accumulation of neutral mutations by drift. However, four lines of evidence allow us to reject this explanation. First, under this drift hypothesis, one expects disproportionately more synonymous than non-synonymous mutations, because the former have no effect on protein sequence and thus are more likely tobeneutral.Infact, all 26pointmutations wefound incodingregions (22 in clone 20K, and 4 off the line of descent) are non-synonymous. The probability of observing no synonymous substitutions is only 0.07% if the same base changes were distributed randomly in the cod- ing regions of the ancestral genome. Second, if mutations had spread by random drift, we would not expect to see mutations in the same genes in the other independently evolved populations of the long-term experiment, because only ,1% of the .4,000 genes in E. coli harbour mutations in the population studied here. By contrast, selection should target the same genes in the replicatelinesbecausetheystartedfromthesameancestorandevolved in identical environments. Fourteen genes in which mutations were found in our study population have been sequenced in all the other populations after 20,000 generations. There is substantial parallelism, with three cases where all eleven other populations have substituted mutations in the same gene, nine additional genes with mutations in other lines, and only two cases where no other line has a mutation in the same gene (Table 1). In almost all cases, the evolved alleles differ between the populations, so accidental cross-contamination cannot explain these parallel changes. Third, under the drift hypothesis, we would expect many muta- tions in individual clones that did not become fixed in the population as a whole. However, almost all mutations in the earlier clones were present in clones from all subsequent generations. For example, four of the six mutations in clone 2K are present in all later clones, and all thirty-four mutations in clone 15K occur in clones 20K and 40K. Moreover, two of the thirteen mutations through 20K that are off the line of descent to the 40K clone occur in genes (pykF and nadR) where different mutations arose and were substituted later. Both of these genes also have substitutions in all of the other populations, so even these early unsuccessful alleles were probably beneficial, but were nonetheless eliminated because competing sub-lineages had even more beneficial mutations25,26. Fourth, strains with these mutations should have no fitness advantage under the neutral drift hypothesis. To date, isogenic strains with ancestral and derived alleles have been constructed at nine loci. In all but one case, the derived allele confers a significant advantage in competition (Table 2). The exception (ompF) might also be beneficial in combination with other mutations present in the genetic background in which it evolved, especially because parallel mutations arose in other populations (Table 1). By contrast, another study found that none of 26 random insertion mutations conferred a significant advantage in the same environment27. Other explanations for rate discordance Taken together, these four lines of evidence demonstrate that dis- cordance in rates of genomic and adaptive evolution in this experi- ment cannot be explained by assuming a preponderance of neutral substitutions. Another plausible explanation for the disparity is an ecological one. Fitness levels were measured, at all generations, in competition with the ancestor. In an evolution experiment with yeast, non-transitive ecological interactions gave rise to complex dynamics, such that the cumulative adaptation measured across suc- cessive episodes of selection was greater than that measured directly from start to finish28. However, there is no significant discrepancy between the fitness gains summed over shorter intervals and the overall improvement measured from start to finish for the popu- lation in our study23, allowing us to reject this hypothesis. Clonal interference occurs in asexual organisms when sub-lineages with beneficial mutations are driven extinct by competition with other sub-lineages bearing mutations that are even more bene- ficial25,26, and this process might contribute to the relatively constant rate of genomic change. In particular, the most beneficial mutations should dominate the early phase of evolution for large populations in a new environment26, but there are more potential mutations that confer small advantages than large ones2,13,29,30. Thus, the supply of contending beneficial mutations may increase enough to sustain a uniform rate of overall genomic change. It may also be relevant that some early substitutions, which contributed the most to fitness improvement, involve global regulatory functions including the stringent response and DNA supercoiling31,32. These mutations have pleiotropic effects on the expression of many genes, and although these changes are beneficial on balance, some of their side effects are probably deleterious. These maladaptive side effects may introduce new opportunities for compensatory changes that restore appropri- ate expression of other genes and thereby further increase the supply of mutations conferring small advantages. Emergence of a hypermutable phenotype Several of the long-term populations evolved mutator phenotypes by 20,000 generations, but the population in our study retained the low ancestral mutation rate to at least that time point33,34. In later genera- tions, however, this population exhibited a greatly elevated rate of genomic evolution (Fig. 2 inset). The 40K genome contains 627 SNP and 26 DIP mutations (Supplementary Tables 3 and 4). As a consequence of the DIP mutations (including six new insertions of Table 1 | Frequency of parallel mutations in 11 other independently evolved lines Gene or region Function Parallel mutations (%) Source nadR Transcriptional regulator 100 Ref. 42 pykF Pyruvate kinase 100 Ref. 42 rbs operon Ribose catabolism 100 Ref. 43 malT Transcriptional regulator 64 Ref. 44 spoT Stringent response regulator 64 Ref. 31 mrdA Cell-wall biosynthesis 45 Ref. 42 infB Translation initiation factor 2 45* This study fis Nucleoid-associated protein 27 E. Crozat, D.S., unpublished topA DNA topoisomerase I 27 E. Crozat, D.S., unpublished pcnB Poly(A) polymerase 27 This study ompF Outer-membrane porin 18* This study rpsD 30S ribosomal protein 18* This study rpsM 30S ribosomal protein 0 This study glmU promoter Cell-wall biosynthesis 0 M. Stanek, R.E.L., unpublished * In addition to populations with substitutions, one or more others were polymorphic. NATURE|Vol 461|29 October 2009 ARTICLES 1245 Macmillan Publishers Limited. All rights reserved ��2009