Sign up & Download
Sign in

L 18s dna

by A D Oyle
American Journal of Botany ()
  • ISSN: 00029122

Abstract

Molecular estimates of the age of angiosperms have varied widely, and many greatly predate the Early Cretaceous appearance of angiosperms in the fossil record, but there have been few attempts to assess confidence limits on ages. Experiments with rbcL and 18S data using maximum likelihood suggest that previous angiosperm age estimates were too old because they assumed equal rates across sites-use of a gamma distribution of rates to correct for site-to-site variation gives 10-30 my (million years) younger ages and relied on herbaceous angiosperm taxa with high rates of molecular evolution. Ages based on first and second codon positions of rbc-L are markedly older than those based on third positions, which conflict with the fossil record in being too young, but all examined data partition, of rbc-L and ISS depart substantially from a molecular clock. Age estimates are surprisingly insensitive to different views on seed-plant relationships. Randomization schemes were used to quantify confidence intervals due to phylogenetic uncertainty, substitutional noise, and lineage effects (deviations from a molecular clock). Estimates of the age of crown-group angiosperms range from 68 to 281 mya (million years ago), depending on data, tree, and assumptions, with most similar to 140-190 mya (Early Jurassic-earliest Cretaceous). Approximate 95% confidence intervals on ages are wider for rbcL than 18S, ranging up to 160 my for phylogenetic uncertainty, 90 my for substitutional noise, and 70 my for lineage effects. These intervals overlap the oldest occurrences of angiosperms in the fossil record, as well as some estimates from previous molecular studies.

Cite this document (BETA)

Available from www.amjbot.org
Page 1
hidden

L 18s dna -

1499 American Journal of Botany 88(8): 1499���1516. 2001. SOURCES OF ERROR AND CONFIDENCE INTERVALS IN ESTIMATING THE AGE OF ANGIOSPERMS FROM RBCL AND 18S RDNA DATA1 MICHAEL J. SANDERSON2 AND JAMES A. DOYLE Section of Evolution and Ecology, University of California, Davis, California 95616 USA Molecular estimates of the age of angiosperms have varied widely, and many greatly predate the Early Cretaceous appearance of angiosperms in the fossil record, but there have been few attempts to assess confidence limits on ages. Experiments with rbcL and 18S data using maximum likelihood suggest that previous angiosperm age estimates were too old because they assumed equal rates across sites���use of a gamma distribution of rates to correct for site-to-site variation gives 10���30 my (million years) younger ages��� and relied on herbaceous angiosperm taxa with high rates of molecular evolution. Ages based on first and second codon positions of rbcL are markedly older than those based on third positions, which conflict with the fossil record in being too young, but all examined data partitions of rbcL and 18S depart substantially from a molecular clock. Age estimates are surprisingly insensitive to different views on seed-plant relationships. Randomization schemes were used to quantify confidence intervals due to phylogenetic uncertainty, substitutional noise, and lineage effects (deviations from a molecular clock). Estimates of the age of crown-group angiosperms range from 68 to 281 mya (million years ago), depending on data, tree, and assumptions, with most 140���190 mya (Early Jurassic���earliest Cretaceous). Approximate 95% confidence intervals on ages are wider for rbcL than 18S, ranging up to 160 my for phylogenetic uncertainty, 90 my for substitutional noise, and 70 my for lineage effects. These intervals overlap the oldest occurrences of angiosperms in the fossil record, as well as some estimates from previous molecular studies. Key words: angiosperms confidence intervals fossil record molecular clock rbcL 18S rDNA. The age of the angiosperms has long been a topic of con- troversy in plant evolution. Traditionally, this problem was ad- dressed from a paleobotanical point of view, but in recent years studies based on the hypothesis of a molecular clock have added a new perspective (Martin, Gierl, and Saedler, 1989 Wolfe et al., 1989 Brandl, Mann, and Sprinzl, 1992 Martin et al., 1993 Laroche, Li, and Bousquet, 1995 Gore- mykin, Hansmann, and Martin, 1997 Sanderson, 1997). Since these analyses conflict both with interpretations of the fossil record and with each other, they relate not only to paleobo- tanical assumptions but also to the general validity of the mo- lecular clock, a major issue in molecular evolution (Fitch, 1976 Gillespie, 1991 Ayala, 1997). In this paper, we address the possibility that some of the apparent conflict between molecular and fossil estimates may stem from insufficient attention to sources of error and as- sessment of confidence limits on age estimates based on mo- lecular data. Because of the potential importance of deviations from true global rate constancy, we consider a much larger sample of taxa than previous age studies. First, we present experiments with data from two genes that have been widely studied for this and related problems, the chloroplast gene rbcL and 18S nuclear rDNA (ribosomal DNA), which suggest that errors in tree topology and variation in rates among lin- eages can lead to erroneous age estimates. Second, we attempt to obtain a more reliable assessment of the confidence interval on molecular age estimates based on rbcL and 18S data, which allows us to quantify several potential sources of error in these estimates. 1 Manuscript received 3 August 2000 revision accepted 13 February 2001. The authors thank Vincent Savolainen, Doug Soltis, and Jeff Thorne for access to results prior to publication, Doug Soltis for 18S sequences, and Sean Graham and an anonymous reviewer for helpful suggestions on the manu- script. This work was supported by NSF grant DEB-9726856. 2 Author for reprint requests (e-mail: mjsanderson@ucdavis.edu). Previous estimates���Until the 1960s, it was widely as- sumed that angiosperms originated long before their first un- questioned fossil record in the mid-Early Cretaceous, based on assignment of Cretaceous fossils (mostly leaves) to diverse and advanced extant taxa (Axelrod, 1952, 1970). However, more recent studies of fossil pollen, leaves, flowers, and fruits have indicated that Early Cretaceous angiosperms were far less ad- vanced than previously believed and have painted a coherent picture of rapid morphological diversification, which in its spe- cifics agrees with views on angiosperm evolution based on modern plants (Doyle, 1969, 1978 Muller, 1970, 1981 Doyle and Hickey, 1976 Friis and Crepet, 1987 Doyle and Dono- ghue, 1993 Friis, Pedersen, and Crane, 1994 Crane, Friis, and Pedersen, 1995). At present, the oldest definite angiosperm fossils are pollen grains of Valanginian or Hauterivian age, 130 mya (million years ago) (Trevisan, 1988 Hughes, 1994 Brenner, 1996) a supposed Jurassic record (Sun et al., 1998) has been redated as Early Cretaceous (Swisher et al., 1999). These data suggest that angiosperms may have originated barely before their first fossil records, although they do not rule out the existence of older angiosperms that were rare and plesiomorphic. The application of phylogenetic thinking to living and fossil seed plants has also affected this discussion. Any extant group has two ages: the age at which its stem lineage branched from the line leading to its extant sister group and the age of the most recent common ancestor of all its living members or the crown group (Hennig, 1965 Jefferies, 1979). Following Doyle and Donoghue (1993), we restrict the term ������angiosperms������ to the crown group this is the age addressed by molecular stud- ies. Most phylogenetic analyses based on morphology have indicated that the sister group of angiosperms is Gnetales, Gnetales plus Bennettitales, or Caytonia (Crane, 1985 Doyle and Donoghue, 1986 Loconte and Stevenson, 1990 Rothwell and Serbet, 1994 Doyle, 1996). Since all these taxa are known
Page 2
hidden
1500 [Vol. 88 AMERICAN JOURNAL OF BOTANY back to the Late Triassic, these results imply that the angio- sperm stem lineage is also this old. However, the crown group could be much younger, especially considering the many apo- morphies that distinguish angiosperms from other seed plants and the plesiomorphic nature of Early Cretaceous fossils. Mo- lecular analyses have generally refuted the relationship of an- giosperms and Gnetales, and several indicate that angiosperms and extant gymnosperms are sister groups, pushing the angio- sperm stem lineage back to the mid-Carboniferous (Goremy- kin et al., 1996 Chaw et al., 1997, 2000 Hansen et al., 1999 Qiu et al., 1999 Winter et al., 1999 Bowe, Coat, and de- Pamphilis, 2000 Donoghue and Doyle, 2000). However, this does not rule out a relationship of angiosperms with Mesozoic groups such as Bennettitales or Caytonia, and it does not relate directly to the age of the crown group. The first molecular studies gave far older ages for the an- giosperms than their oldest fossil records. Ramshaw et al. (1972) obtained an estimate of 350���420 mya (Late Silurian- Mississippian) based on amino acid sequences of cytochrome c, calibrated with the bird���mammal split. Using nonsynony- mous substitutions in the nuclear gene gapC, calibrated with the animal fossil record and the presumed divergence of plants, animals, and fungi at 1000 mya, Martin, Gierl, and Saedler (1989) dated the split between monocots (two grasses) and dicots (Magnolia and six eudicots) as 319 mya (mid-Car- boniferous). This is more than twice the age of the oldest fos- sils at that time, the most advanced known seed plants were ������seed ferns������ more plesiomorphic than all living seed plants, to say nothing of angiosperms. Martin, Gierl, and Saedler (1989) dismissed the concept of a Cretaceous origin as based on negative evidence and suggested that their results favored the views of Axelrod (1952, 1970). However, Crane et al. (1989) argued that the conflict with the fossil record is not so easy to explain away. In particular, Martin, Gierl, and Saedler dated the common ancestor of eudicots as 276 mya (Permian), but eudicots (a strongly supported monophyletic group: Chase et al., 1993 Soltis et al., 1998 Qiu et al., 1999 Soltis, Soltis, and Chase, 1999) are united by tricolpate pollen, which has a dense fossil record, beginning in the late Barremian (120 mya: Doyle, 1992 Hughes, 1994) and becoming ubiquitous in the Albian (110 mya). Furthermore, Albian eudicots represent lines near the base of this clade (Doyle, 1998b Magallo ��n, Crane, and Herendeen, 1999). Subsequent studies made the improvement of calibrating dates with other land plants. Some have given more recent ages, though still pre-Cretaceous. Wolfe et al. (1989) dated the angiosperms as 200 mya (Early Jurassic), using rRNA (ribo- somal RNA) sequences, several chloroplast genes, and two calibrations: the divergence of three grasses at 60 mya and the split of liverworts from other land plants at 400 mya (Early Devonian), which is probably 50 my (million years) too recent (vascular plant megafossils extend back to the Middle Silurian and land plant spores to the Middle Ordovician: Kenrick and Crane, 1997). For rRNA, they also had a cycad sequence this diverged from angiosperms at 340 mya (Mississippian), which is consistent with fossil data. Laroche, Li, and Bousquet (1995) also dated angiosperms at 200 mya, based on nonsy- nonymous substitutions in several mitochondrial genes, cali- brated with grasses and legumes. However, other studies with improved calibrations have given older ages. Martin et al. (1993) added a liverwort and a conifer and used nonsynony- mous substitutions in both gapC and rbcL assuming that liv- erworts diverged at 450 mya (Late Ordovician) and conifers at 330 mya (Late Mississipian), they dated the monocot���dicot split as 300 mya (Late Pennsylvanian). In a study of chloro- plast transfer RNAs, calibrated with divergence of a liverwort and two grasses, Brandl, Mann, and Sprinzl (1992) also ob- tained a 300 mya age for angiosperms. The youngest estimate so far was obtained by Goremykin, Hansmann, and Martin (1997), based on protein sequences of 58 genes from six completely sequenced chloroplast genomes (Porphyra, Marchantia, Pinus, Nicotiana, Oryza, Zea). As- suming that Marchantia diverged at 450 mya, these authors dated the angiosperms as 160 mya (Late Jurassic) and the split between Pinus and angiosperms as 348 mya (Early Carbon- iferous), which they noted is more congruent with fossil evi- dence than their earlier results (Martin, Gierl, and Saedler, 1989 Martin et al., 1993). However, they found strong line- age-specific rate variation in the two grass genomes and there- fore calculated the angiosperm age from the root node to Ni- cotiana only. Thus, although their analysis used an unprece- dented number of genes, their dates were based on a very small number of taxa. Sanderson (1997) used an experimental method (NPRS) for reconstructing ages in the absence of a molecular clock, which smooths local variations in rates by an optimization algorithm. Based on 36 land plant rbcL sequences and a land plant cal- ibration of 450 mya, he obtained an estimate of 165 mya (Mid- dle Jurassic). Using the same rbcL data set, Thorne, Kishino, and Painter (1998, fig. 3) used a model-based Bayesian ap- proach to calculate that the angiosperm root node is 51% as old as the most recent common ancestor of vascular plants (i.e., 200 mya, Early Jurassic). Both methods assume an au- tocorrelation in rates of molecular evolution across the tree, the presence or magnitude of which has yet to be determined. Sources of error in estimating divergence times���These dates are in considerable conflict with each other and with the fossil record. Some of this conflict can be attributed to biases in the data or the statistical estimation methods used, but much of it is probably due to stochastic and deterministic aspects of the molecular evolutionary process itself, especially rate var- iation across lineages, or ������lineage effects������ (Britten, 1986 Gil- lespie, 1991 Gaut, Muse, and Clegg, 1993 Avise, 1994 Clegg et al., 1994 Nickrent and Starr, 1994 Li, 1997 Yang and Nielsen, 1998). Even with a stochastically constant rate, substitutional noise imposes an absolute lower bound on errors in age estimates (Kumar, Tamura, and Nei, 1993 Hillis, Ma- ble, and Moritz, 1996). Variation in rate across sites causes sequence divergences to be estimated incorrectly, most se- verely at high rates (Gillespie, 1986 Yang, 1996) and high rate variability (Kelly and Rice, 1996 Miyamoto and Fitch, 1996 Yang, 1996). Still other errors relate to the underlying phylogenetic context for molecular divergence, including in- correct phylogenies and calibrations that associate fossil ages with the wrong nodes of a tree. Several of the angiosperm studies reported the error rate in estimation of branch lengths due to substitutional noise (e.g., Goremykin, Hansmann, and Martin, 1997), but only Martin, Gierl, and Saedler (1989), Martin et al. (1993), and Sanderson (1997) used it to assess the corresponding errors in age esti- mates. Several studies tested for lineage effects, but only Wolfe et al. (1989) assessed the error component due to these. Wolfe et al. (1989), Brandl, Mann, and Sprinzl (1992), Laro- che, Li, and Bousquet (1995), and Goremykin, Hansmann, and Martin (1997) considered calibration error (although the last
Page 3
hidden
August 2001] 1501 SANDERSON AND DOYLE���MOLECULAR ESTIMATES OF ANGIOSPERM AGE authors, concluding that substitutional noise was relatively low, subsumed it in the calibration error). None of these stud- ies considered between-site sequence rate heterogeneity or choice of the tree used in deriving age estimates. The ideal tree, of course, would be the true tree. Most studies have used trees derived from phylogenetic analysis of each gene under study, but many of these are clearly incorrect as species trees, since they differ from each other. In order to evaluate these results, we undertook our own analyses of rbcL and 18S data, designed to probe the various sources of error, reasons why estimates have varied so much, and ways to obtain better estimates. Our taxon sampling (mod- ified from Sanderson, 1997) was designed to span critical nodes, provide an adequate sample of extant outgroups, and allow comparisons with previous studies and fossil evidence on the ages of nodes. First, we present a series of analyses that illustrate the effect of various factors on point estimates of the age of angiosperms: variations in tree topology, models for nucleotide substitution (with and without rate variation), sampling of taxa with different rates of evolution (lineage ef- fects), and use of first and second vs. third codon positions (an approximation of nonsynonymous vs. synonymous substi- tutions). Second, we present a series of resampling experi- ments designed to provide a statistical estimate of the relative magnitude of errors due to these factors. MATERIALS AND METHODS Sequence data���We used published sequences and alignments of the chlo- roplast rbcL gene (1428 bp Chase et al., 1993) and the 18S rDNA gene ( 1842 bp, excluding poorly aligned segments Chaw et al., 1997 Soltis et al., 1997), supplemented by a few sequences from GenBank. Data sets and published references for species used, vouchers, and alignments are provided at http://loco.ucdavis.edu/sandlab/sl.htm. Both genes have been widely studied in seed-plant phylogenetics, sampled across a large number and diversity of taxa, and subjected to intense scrutiny with regard to methodological issues such as large data sets and measures of support (Rice, Donoghue, and Olm- stead, 1997 Ka ��llersjo �� et al., 1998 Soltis et al., 1998). Taxa sampled���The 37 taxa in our data set comprise 22 angiosperms, 9 other seed plants, 5 other land plants, and Chara, one of the most closely related green algae, to root land plants (Mishler et al., 1994). To span the root node of extant angiosperms, we included a variety of ������magnoliid������ taxa, based on current understanding of angiosperm relation- ships. Analyses of atpB (Savolainen et al., 2000), phytochrome genes (Ma- thews and Donoghue, 1999), a combined 18S, rbcL, and atpB data set (Soltis et al., 1998 Soltis, Soltis, and Chase, 1999), and five-gene data sets including mitochondrial genes (Parkinson, Adams, and Palmer, 1999 Qiu et al., 1999) indicate that Amborella is the sister group of all other angiosperms, followed by Nymphaeales and then a clade consisting of Austrobaileya, Trimeniaceae, and Illiciales, in agreement with earlier analyses that placed Nymphaeales at the base of angiosperms (Hamby and Zimmer, 1992 Doyle, Donoghue, and Zimmer, 1994 Goremykin et al., 1996). Other analyses link Amborella with Nymphaeales or reverse these two taxa (Barkman et al., 2000 Graham and Olmstead, 2000 Qiu et al., 2000), but these lines are still basal to other angiosperms. We represented these basal lines with Amborella, Nymphaea, and Austrobaileya, and other magnoliid clades (APG, 1998 Qiu et al., 1999 Soltis, Soltis, and Chase, 1999) with Magnolia (Magnoliales), Calycanthus and either Persea or Sassafras (Laurales), Drimys (Winteraceae), Saururus (Piperales), and Chloranthus (Chloranthaceae). We did not include Cerato- phyllum, which is sister to all other angiosperms in trees based on rbcL (Chase et al., 1993), because it is never basal in analyses of other genes. If we had included Ceratophyllum, it would be unclear to what extent our conclusions were a function of this anomalous rooting, without performing additional ex- periments with topological constraints. For other seed plants, we included the three genera of Gnetales, Ginkgo, and Cycas and Zamia, the latter representing the basal split in Cycadales. Pinaceae (plus Gnetales in some studies) are the sister group of other conifers in molecular analyses (Chaw et al., 1997, 2000 Stefanovic et al., 1998 Qiu et al., 1999 Bowe, Coat, and dePamphilis, 2000) to span the basal conifer node, we used Picea (Pinaceae), Podocarpus (Podocarpaceae), and Taxus (Taxaceae). In ferns, Osmunda represents Osmundaceae, the probable sister group of other Filicales (Pryer, Smith, and Skog, 1995), exemplified by As- plenium. Marchantia represents liverworts, which morphological and some molecular analyses identify as the sister group of other land plants (Mishler et al., 1994 Qiu et al., 1998). Although other molecular analyses place an- thocerotes in this position (Nickrent et al., 2000), this should not be critical for our purposes, since Marchantia is the only bryophytic group in our data set, and at worst Marchantia represents a clade that diverged just one node above the base of land plants. For 30 species sequences were available for both genes. For the seven other taxa, we used a different exemplar of the same family for the two genes (18S/ rbcL): Nageia/Afrocarpus (Podocarpaceae) Sassafras/Persea (Lauraceae) Calla/Spathiphyllum (Araceae) Veitchia/Drymophloeus (Palmae) Buxus/ Pachysandra (Buxaceae) Arctostaphylos/Enkianthus (Ericaceae) Brunfelsia/ Nicotiana (Solanaceae). This procedure may introduce some error because of changes in rate of evolution within families, but presumably these tend to be smaller than changes between families. Trees���Because one of our goals was to clarify the effect of tree topology on age estimates, we examined a series of eight ������standard������ trees. Three of these were found by normal parsimony analysis of rbcL and 18S the other five, intended to represent a range of current hypotheses on seed-plant phy- logeny, were obtained by imposing topological constraints during parsimony analysis of rbcL, 18S, or the two data sets combined. Some of these con- straints are not directly relevant to seed-plant relationships but were needed to correct anomalies elsewhere in the tree (e.g., in rooting of vascular plants or of angiosperms). These constraints and the reasoning behind their selection are described at the point where each tree is first discussed in the Results section. For these analyses, we used PAUP 3.1 (Swofford, 1991) to find most parsimonious trees, with 100 replicates using stepwise random addition of taxa, MULPARS (multiple most parsimonious trees), TBR (tree bisection- reconnection) branch swapping, and holding one tree at each step. For several subsequent analyses we used one of these trees, designated the ������gnetifer������ tree, in which Gnetales are the sister group of conifers and angiosperms are the sister group of other seed plants, as indicated by 18S data (Chaw et al., 1997, 2000 Bowe, Coat, and dePamphilis, 2000). Recent multigene analyses (Qiu et al., 1999 Bowe, Coat, and dePamphilis, 2000 Chaw et al., 2000) have produced somewhat different ������gnepine������ trees in which Gnetales are nested within now-paraphyletic conifers, linked with Pinaceae, but the gnetifer tree is more consistent with loss of the inverted repeat in the chloroplast genome of conifers but not Gnetales (Raubeson and Jansen, 1992b). For com- parisons with trees of Martin, Gierl, and Saedler (1989) and Martin et al. (1993), we also examined trees including only three angiosperms comparable to those in their study, plus three other subsets of angiosperm taxa, designed to address problems of variation in rates of evolution. Preliminary hypothesis testing���Prior to estimating ages, we undertook a round of hypothesis testing to infer the tempo and mode of evolution of these genes. We used ML (maximum likelihood) methods (Swofford et al., 1996 Huelsenbeck and Rannala, 1997) for estimation of evolutionary parameters and hypothesis testing. Several models of nucleotide substitution were ex- amined, differing in complexity and number of parameters. The F81 (������Fel- senstein 1981������), HKY85 (������Hasegawa-Kishino-Yano 1985������), and GTR (gen- eral time-reversible) models estimate one, two, and six parameters in the rate matrix, respectively (Swofford et al., 1996). Site-to-site rate variation was implemented using a gamma distribution of rates (denoted by adding ������1 G������ to the acronyms above, and referred to as ������gamma������ in the following discus- sion). The shape parameter of the gamma distribution is estimated from the data using a four-category discrete approximation. In the absence of rate con- stancy across lineages, there are also 2N 2 2 branch length parameters to be

Readership Statistics

35 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
29% Ph.D. Student
 
26% Post Doc
 
11% Researcher (at an Academic Institution)
by Country
 
17% Brazil
 
14% Germany
 
11% United States

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in