Functional metagenomic profiling ...
LETTERS Functional metagenomic profiling of nine biomes Elizabeth A. Dinsdale1,5*, Robert A. Edwards1,2,3,6*, Dana Hall1, Florent Angly1,4, Mya Breitbart7, Jennifer M. Brulc8, Mike Furlan1, Christelle Desnues1{, Matthew Haynes1, Linlin Li1, Lauren McDaniel7, Mary Ann Moran10, Karen E. Nelson11, Christina Nilsson12, Robert Olson6, John Paul7, Beltran Rodriguez Brito1,4, Yijun Ruan12, Brandon K. Swan13, Rick Stevens6, David L. Valentine13, Rebecca Vega Thurber1, Linda Wegley1, Bryan A. White8,9 & Forest Rohwer1,2 Microbial activities shape the biogeochemistry of the planet1,2 and macroorganism health3. Determining the metabolic processes performed by microbes is important both for understanding and for manipulating ecosystems (for example, disruption of key pro- cesses that lead to disease, conservation of environmental services, and so on). Describing microbial function is hampered by the inability to culture most microbes and by high levels of genomic plasticity. Metagenomic approaches analyse microbial communit- ies to determine the metabolic processes that are important for growth and survival in any given environment. Here we conduct a metagenomic comparison of almost 15 million sequences from 45 distinct microbiomes and, for the first time, 42 distinct viromes and show that there are strongly discriminatory metabolic profiles across environments. Most of the functional diversity was main- tained in all of the communities, but the relative occurrence of metabolisms varied, and the differences between metagenomes predicted the biogeochemical conditions of each environment. The magnitude of the microbial metabolic capabilities encoded by the viromes was extensive, suggesting that they serve as a repo- sitory for storing and sharing genes among their microbial hosts and influence global evolutionary and metabolic processes. Genomic plasticity of microbes causes variations in the gene con- tent of closely related strains4, making predictions of community metabolism on the basis of representative genomes and signature genes such as 16S ribosomal RNA unreliable. Although it seems that core genomes are relatively stable and shared among most indivi- duals of the same species, parts of the genome (for example, pro- phages, CRISPRs, pathogenicity/ecological islands, ORFans) are hyper-variable5. Together, these two components make up the pan- genome4. Unlike the signature genes approach, metagenomic approaches analyse the complete genetic information of microbial and viral communities6,7. In this way, the relative abundances of all genes can be determined and used to generate a description of the functional potential of each community8���14. Here we use a comparative metagenomic approach to statistically analyse the frequency distribution of 14,585,213 microbial and viral metagenomic sequences to elucidate the functional potential of nine biomes including: subterranean (that is, mine samples) hypersaline ponds from solar salterns marine freshwater coral- associated microbialites (including stromatolites and thrombolites) aquaculture-fish-associated terrestrial-animal-associated and mosquito-associated (details in Supplementary Table 1 and Supplementary Fig. 1). Microbial and viral metagenomes (Supplementary Fig. 2 and Supplementary Table 2) were isolated and pyrosequenced. The sequences were compared to the 2007 SEED platform (http://www.theseed.org) using the BLASTX algo- rithm, and hits with an E-value of ,0.001 were considered to be significant (Methods). A total of 1,040,665 sequences from the 45 microbial metagenomes and 541,979 sequences from the 42 viral metagenomes were significantly similar to functional genes within the SEED (Supplementary Table 1). The SEED arranges metabolic pathways into a hierarchical structure in which all of the genes required for a specific task are arranged into subsystems15. At the highest level of organization, the subsystems include both catabolic and anabolic functions (for example, DNA metabolism) and at the lowest levels the subsystems are specific pathways (for example, the synthesis pathway for thymidine). Table 1 shows the relative abundances of sequences assigned to each major subsystem in the combined analysis of the microbiomes *These authors contributed equally to this work. 1 Department of Biology, 2 Center for Microbial Sciences, 3 Department of Computer Sciences, and 4 Computational Science Research Centre, San Diego State University, San Diego, California 92182, USA. 5 School of Biological Sciences, Flinders University, Adelaide, South Australia 5042, Australia. 6 Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, Illinois 60439, USA. 7 University of South Florida, College of Marine Science, 140 7th Avenue South, St Petersburg, Florida 33701, USA. 8 Department of Animal Sciences, and 9 The Institute for Genomic Biology, University of Illinois, Urbana, Illinois 61801, USA. 10 Department of Marine Sciences, University of Georgia, Athens, 30602 Georgia, USA. 11The J. Craig Venter Institute, 9712 Medical Center Drive, Rockville, Maryland 20850, USA. 12Genome Institute of Singapore, 60 Biopolis Street, 02-01, Genome, Singapore 138672, Singapore. 13Department of Earth Science, University of California Santa Barbara, Santa Barbara, California 93106, USA. {Present address: Unite �� des Rickettsies, CNRS-UMR 6020, Faculte �� de medecine, �� 13385 Marseille, France. Table 1 | Mean percentage of sequences (6 s.e.m.) similar to major metabolisms Metabolic category Microbial metagenomes Viral metagenomes Carbohydrates 17.218 (6 0.648) 14.353 (6 0.718) Amino acids 12.036 (6 0.491) 10.132 (6 0.642) Virulence 9.788 (6 0.339) 11.175 (6 0.508) Protein metabolism 9.123 (6 0.497) 8.838 (6 0.522) Respiration 7.139 (6 1.285) 3.718 (6 0.276) Photosynthesis 6.965 (6 2.148) 1.984 (6 0.554) Cofactors, vitamins, and so on 5.411 (6 0.226) 6.661 (6 0.393) RNA metabolism 3.971 (6 0.195) 4.324 (6 0.387) DNA metabolism 3.970 (6 0.180) 7.555 (6 0.943) Nucleosides and nucleotides 3.316 (6 0.149) 7.666 (6 0.817) Cell wall and capsule 3.235 (6 0.223) 5.098 (6 0.649) Fatty acids and lipids 3.095 (6 0.160) 3.002 (6 0.242) Membrane transport 2.736 (6 0.158) 2.371 (6 0.182) Stress response 2.599 (6 0.115) 3.354 (6 0.326) Aromatic compounds 2.351 (6 0.175) 2.550 (6 0.340) Cell division and cell cycle 1.791 (6 0.091) 1.983 (6 0.212) Nitrogen metabolism 1.547 (6 0.070) 1.135 (6 0.093) Sulphur metabolism 1.230 (6 0.102) 1.302 (6 0.134) Motility and chemotaxis 1.022 (6 0.096) 1.011 (6 0.083) Phosphorus metabolism 0.909 (6 0.080) 1.319 (6 0.167) Cell signalling 0.885 (6 0.076) 0.885 (6 0.072) Potassium metabolism 0.796 (6 0.048) 0.846 (6 0.079) Secondary metabolism 0.159 (6 0.014) 0.235 (6 0.047) Vol 452|3 April 2008|doi:10.1038/nature06810 629 NaturePublishing Group ��2008
compared with the viromes. Over 30% of the identifiable genes in the microbiomes were associated with carbohydrate or protein meta- bolism. Respiration and photosynthesis subsystems accounted for an additional ,15% of the similarities. Subsystems responsible for nuc- leic acid metabolism and virulence were overrepresented in the viral fractions (Table 1), whereas respiration and photosynthesis genes were less frequent. The functional diversity represented by the metagenomes approached its theoretical limit of 2.81 in all environments (Table 2), showing that most subsystems were represented in all of the samples. Only the coral-associated microbes showed a lower func- tional diversity this is because they have fewer secondary metabo- lisms, virulence pathways, cell signalling pathways and membrane transport pathways. Because microbes associated with corals are taxo- nomically diverse11, functional reduction may have occurred in these communities, similar to microbes in other symbiotic relationships16. Diversity is a function of both richness (that is, the number of metabolic processes) and evenness (that is, the relative abundance of a particular metabolic process in a sample). The evenness for the metagenomes was very low (,0.1 Table 2 and Supplementary Fig. 3), showing that there are a few dominant metabolisms in each environment. Differential dominant metabolisms suggest that there are characteristic functional profiles of the metagenomes. To test the hypothesis that each environment has a distinguishing metabolic profile, a canonical discriminant analysis (CDA) was con- ducted (Fig. 1). Most of the variance between the different environ- ments (79.8% of the combined microbiome and 69.9% of the virome) was explained in this analysis, showing that metagenomes are highly predictive of metabolic potential within an ecosystem. In contrast, a recent analysis of 16S rRNA genes from multiple environ- ments only explained about 10% of the variance17, suggesting that different ecosystems cannot be distinguished by their taxa. The position of each metagenome in Fig. 1 reflects the frequency combination of sequences associated with each subsystem the vec- tors indicate which metabolisms most strongly determined the dis- tribution. Using these as clues, it is possible to determine which metabolisms are important for the organisms in that environment relative to other environments. For example, subsystems involved in respiration and protein metabolism placed the coral-associated microbes apart from the microbes found within terrestrial animals. This trend is visualized in Fig. 2, which shows that ,20% of the coral- associated microbial genes were involved in respiration, compared with only 3% in the microbiomes associated with terrestrial animals. The relatively high occurrence of respiration-associated genes in the coral-associated microbiomes reflects the diurnally fluctuating oxygen environment, which is supersaturated with oxygen in the day and essentially anaerobic at night18. In contrast, microbes living within the stable anaerobic alimentary tracts of terrestrial animals are less likely to experience selection for multiple respiration pathways. Similarly, virulence genes were proportionally more abundant in the organism-associated microbes than in free-living microbes. These are the factors necessary to facilitate symbiotic relationships (mutualism, parasitism or commensalisms Fig. 2f���h). Another example of the predictive power of the metagenomes is the sulphur metabolisms associated with aquaculture fish. In particular, two sub- systems���alkanesulphonate and taurine metabolism���were overre- presented in fish-associated metagenomes (Supplementary Fig. 4). Alkanesulphonates are involved in the use of both inorganic and organic sulphur, such as taurine and aliphatic sulphonates19 (taurine is a sulphur organic acid used to supplement aquaculture fish food20). Table 2 | Mean functional diversity and evenness (6 s.e.m.) of metagenomes, sampled from nine environments Functional diversity (H9) Functional evenness Biome Microbial Viral Microbial Viral Subterranean 2.393 (6 0.030) 0.005 (6 1.2 3 1024) Hypersaline 2.361 (6 0.006) 2.041 (6 0.021) 0.005 (6 1.4 3 1024) 0.012 (6 5.6 3 1024) Marine 2.313 (6 0.021) 2.162 (6 0.026) 0.005 (6 0.9 3 1024) 0.007 (6 4.0 3 1024) Freshwater 2.430 (6 0.003) 2.080 (6 0.034) 0.005 (6 0.9 3 1024) 0.010 (6 6.7 3 1024) Coral 1.733 (6 0.059) 2.289 (6 0.023) 0.009 (6 5.2 3 1024) 0.007 (6 1.1 3 1024) Microbialites 2.408 (6 0.015) 1.743 (6 0.115) 0.005 (6 3.8 3 1024) 0.019 (6 6.9 3 1023) Fish 2.447 (6 0.001) 2.439 (6 3.131024) 0.005 (6 0.4 3 1024) 0.005 (6 0.7 3 1024) Terrestrial animals 2.428 (6 0.006) 2.016 (6 0.173) 0.004 (6 0.1 3 1024) 0.017 (6 4.5 3 1023) Mosquito 2.395 (6 0.015) 0.004 (6 0.5 3 1024) There are no subterranean viral metagenomes and no mosquito microbial metagenomes. Cell wall Virulence Membrane transport Stress Sulphur Signalling Motility Respiration Protein Canonical discriminant function 1 (48.0%) Subterranean Hypersaline Marine Freshwater Coral Microbialites Fish Terrestrial animals Mosquito Membrane transport Carbohydrates Fatty acids Secondary metabolites Phosphorus Virulence Cell division DNA Potassium Motility Canonical discriminant function 1 (38.9%) Canonical discriminant function 2 (31.0%) Canonical discriminant function 2 (31.9%) a b Figure 1 | Functional analysis of microbial and viral metagenomes. The CDA of the microbial (a) and viral (b) metagenomes identified that the metabolic processes grouped these communities in the two-dimensional spaced described by canonical discriminant functions 1 and 2. The symbols represent the position of each metagenome and the vectors represent the structural matrix for subsystems that were identified as influencing the separation of the metagenomes using the stepwise procedure. The length of the vectors represents the strength of influence of the particular metabolic process. The cross-validation scores for the microbial and viral metagenomes were 66.7 and 59.9%, respectively. LETTERS NATURE|Vol 452|3 April 2008 630 NaturePublishing Group ��2008