Uncovering the evolutionary origi...
Timme and Delwiche BMC Plant Biology 2010, 10:96 http://www.biomedcentral.com/1471-2229/10/96 Open Access RESEARCH ARTICLE �� 2010 Timme and Delwiche licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and repro- duction in any medium, provided the original work is properly cited. Research article Uncovering the evolutionary origin of plant molecular processes: comparison of Coleochaete (Coleochaetales) and Spirogyra (Zygnematales) transcriptomes Ruth E Timme*1 and Charles F Delwiche1,2 Abstract Background: The large and diverse land plant lineage is nested within a clade of fresh water green algae, the charophytes. Collection of genome-scale data for land plants and other organisms over the past decade has invigorated the field of evolutionary biology. One of the core questions in the field asks: how did a colonization event by a green algae over 450 mya lead to one of the most successful lineages on the tree of life? This question can best be answered using the comparative method, the first step of which is to gather genome-scale data across closely related lineages to land plants. Before sequencing an entire genome it is useful to first gather transcriptome data: it is less expensive, it targets the protein coding regions of the genome, and provides support for gene models for future genome sequencing. We built Expressed Sequence Tag (EST) libraries for two charophyte species, Coleochaete orbicularis (Coleochaetales) and Spirogyra pratensis (Zygnematales). We used both Sanger sequencing and next generation 454 sequencing to cover as much of the transcriptome as possible. Results: Our sequencing effort for Spirogyra pratensis yielded 9,984 5' Sanger reads plus 598,460 GS FLX Standard 454 sequences Coleochaete orbicularis yielded 4,992 5' Sanger reads plus 673,811 GS FLX Titanium 454 sequences. After clustering S. pratensis yielded 12,000 unique transcripts, or unigenes, and C. orbicularis yielded 19,000. Both transcriptomes were very plant-like, i.e. most of the transcripts were more similar to streptophytes (land plants + charophyte green algae) than to other green algae in the sister group chlorophytes. BLAST results of several land plant genes hypothesized to be important in early land plant evolution resulted in high quality hits in both transcriptomes revealing putative orthologs ripe for follow-up studies. Conclusions: Two main conclusions were drawn from this study. One illustrates the utility of next generation sequencing for transcriptome studies: larger scale data collection at a lower cost enabled us to cover a considerable portion of the transcriptome for both species. And, two, that the charophyte green algal transcriptoms are remarkably plant-like, which gives them the unique capacity to be major players for future evolutionary genomic studies addressing origin of land plant questions. Background The ancestry of all living land plants (embryophytes) can be traced back to a single colonization event from a charophyte green alga. In other words, the tremendous diversity we see in land plants today--from mosses to red- woods and orchids--all descended from a single common ancestor that colonized land 430-470 million years ago [1,2]. Uncertainty remains concerning the precise rela- tionships between embryophytes and their algal relatives [3-8], but there is no serious doubt that the origin of land plants occurred from within the charophytes. There are six orders of charophyte green algae that, when embryo- phytes are included, comprise the Streptophyta sensu lato (s.l.) [9]: the Mesostigmatales, Chlorokybales, Klebsor- midiales, Zygnematales, Coleochaetales and Charales. * Correspondence: retimme@umd.edu 1 Cell Biology and Molecular Genetics, University of Maryland, College Park, MD, 20742, USA Full list of author information is available at the end of the article
Timme and Delwiche BMC Plant Biology 2010, 10:96 http://www.biomedcentral.com/1471-2229/10/96 Page 2 of 12 Both phylogenetic and fossil evidence suggest that these orders are extremely old lineages, comparable in age to the land plants [2]. Therefore, an understanding of the biology of land plants based on comparative genomics would benefit greatly if data were available from these organisms. Unfortunately, in most cases the genome size is poorly characterized, the tools of molecular genetics are not well developed, or cultures are difficult to main- tain. Consequently, the acquisition of genomic data from these organisms has lagged other lineages. To move toward comprehensive genomic analysis of charophytes, we undertook EST analysis of two representative charo- phytes, Spirogyra pratensis and Coleochaete orbicularis. Despite there being significant genomic resources available for the broader group of green algae, including Chlorophyta, there is only one published EST library to date that directly bears on the charophytes, that of Mesostigma viride [10]. Mesostigma is a unicellular, monotypic genus that in some analyses is placed as sister to the rest of streptophytes [6,11-14], although other studies have placed it as a sister to all other green algae [15,16]. In either case, its EST library is a valuable resource for this study. Most taxonomic and ecological diversity in the green algae resides in the Chlorophyta, a large clade sister to the streptophytes. Among the impor- tant organisms in this sister clade are the model organism Chlamydomonas reinhardtii and the ecologically signifi- cant Ostreococcus tauri. Both of these organisms have fully sequenced and published genomes [17,18]. According to Darwin's centralizing theme of descent with modification, one would predict that all land plant genes should have a homolog in the charophytes unless there was horizontal gene transfer from a non-plant organism, or unprecedented neofunctionalization. How- ever, any one lineage of charophytes might be expected to have lost or modified some of these in the 500 million years or more of independent evolutionary history that separates each lineage from embryophytes. In this con- text, it is to be expected that land plant genes and their associated molecular pathways either originated in the charophytes or, if more ancient, were retained along these green algal lineages leading up to the colonization of land. Thus, it is important to sample broadly among the charo- phytes if the homologs of key embryophyte genes are to be identified. In recent years PCR-based approaches have been used to fish out specific land plant genes of interest in the charophyte lineages, but advances in sequencing technol- ogies have now made it far more efficient to gather high- throughput genomic data and work backwards, using plant gene models to annotate the putative homologous genes. Sequencing expressed sequence tags (ESTs) is an efficient first pass at gathering a large portion of the genomic coding regions. We undertook here an analysis of two distantly related charophyte taxa: Spirogyra prat- ensis Transeau (Zygnematales) and Coleochaete orbicu- laris Pringh. (Coleochaetales). Both of these lineages are essential to understanding the placement of land plants in the context of their nearest living green algal relatives. In addition, evidence of land plant molecular pathways, such as the ethylene response pathway, in the charo- phytes would reveal the origins of key plant molecular processes. Results EST statistics Our sequencing effort for Spirogyra pratensis yielded 9,984 5' Sanger reads plus 598,460 GS FLX Standard 454 sequences Coleochaete orbicularis yielded 4,992 5' Sanger reads plus 673,811 GS FLX Titanium 454 sequences (Table 1). The average length of Sanger sequences was 915 bp (C. orbicularis) or 1,346 bp (S. pratensis) before trimming for low quality and vector sequence. The average length for the raw 454 reads dif- fered between the older GS FLX Standard and newer GS FLX Titanium sequencing technologies of 211 and 378 bp, respectively. The 454 sequences for each species were trimmed of vector and low-quality sequences, and then Table 1: EST Sequence statistics 454 reads 5' Sanger reads 454 assembly Sanger assembly Combined assembly C. orbicularis Number of reads 673,811 4992 26,373 2,455 19,313 Average length (bp) 378 915 712 721 813 GC content 47.9% 46.4% 49.2% 48.6% 49.4% S. pratensis Number of reads 598,460 9984 12,357 2836 12,191 Average length (bp) 211 1346 493 845 571 GC content 42.7% 43.9% 41.1% 42.5% 41.1%