A core gut microbiome in obese an...
LETTERS A core gut microbiome in obese and lean twins Peter J. Turnbaugh1, Micah Hamady3, Tanya Yatsunenko1, Brandi L. Cantarel5, Alexis Duncan2, Ruth E. Ley1, Mitchell L. Sogin6, William J. Jones7, Bruce A. Roe8, Jason P. Affourtit9, Michael Egholm9, Bernard Henrissat5, Andrew C. Heath2, Rob Knight4 & Jeffrey I. Gordon1 The human distal gut harbours a vast ensemble of microbes (the microbiota) that provide important metabolic capabilities, includ- ing the ability to extract energy from otherwise indigestible dietary polysaccharides1���6. Studies of a few unrelated, healthy adults have revealed substantial diversity in their gut communities, as mea- sured by sequencing 16S rRNA genes6���8, yet how this diversity relates to function and to the rest of the genes in the collective genomes of the microbiota (the gut microbiome) remains obscure. Studies of lean and obese mice suggest that the gut microbiota affects energy balance by influencing the efficiency of calorie har- vest from the diet, and how this harvested energy is used and stored3���5. Here we characterize the faecal microbial communities of adult female monozygotic and dizygotic twin pairs concordant for leanness or obesity, and their mothers, to address how host genotype, environmental exposure and host adiposity influence the gut microbiome. Analysis of 154 individuals yielded 9,920 near full-length and 1,937,461 partial bacterial 16S rRNA sequences, plus 2.14 gigabases from their microbiomes. The results reveal that the human gut microbiome is shared among family members, but that each person���s gut microbial community varies in the specific bacterial lineages present, with a comparable degree of co-variation between adult monozygotic and dizygotic twin pairs. However, there was a wide array of shared microbial genes among sampled individuals, comprising an extensive, identifiable ���core micro- biome��� at the gene, rather than at the organismal lineage, level. Obesity is associated with phylum-level changes in the microbiota, reduced bacterial diversity and altered representation of bacterial genes and metabolic pathways. These results demonstrate that a diversity of organismal assemblages can nonetheless yield a core microbiome at a functional level, and that deviations from this core are associated with different physiological states (obese compared with lean). We characterized gut microbial communities in 31 monozygotic twin pairs, 23 dizygotic twin pairs and, where available, their mothers (n 5 46) (Supplementary Tables 1���5). Monozygotic and dizygotic co-twins and parent���offspring pairs provided an attractive model for assessing the impact of genotype and shared early environmental exposures on the gut microbiome. Moreover, genetically ���identical���9 monozygotic twin pairs gain weight in response to overfeeding in a more reproducible way than unrelated individuals10 and are more concordant for body mass index (BMI) than dizygotic twin pairs11. Twin pairs who had been enrolled in the Missouri Adolescent Female Twin Study (MOAFTS12) were recruited for this study (mean period of enrolment in MOAFTS, 11.7 6 1.2 years range, 4.4���13.0 years). Twins were 21���32 years old, of European or African ancestry, and were generally concordant for obesity (BMI 30 kg m22) or leanness (BMI 5 18.5���24.9 kg m22) (one twin pair was lean/over- weight (overweight defined as BMI $ 25 and , 30) and six pairs were overweight/obese). They had not taken antibiotics for at least 5.49 6 0.09 months. Each participant completed a detailed medical, lifestyle and dietary questionnaire: study enrolees were broadly representative of the overall Missouri population for BMI, parity, education and marital status (see Supplementary Results). Although all were born in Missouri, they currently live throughout the USA: 29% live in the same house, but some live more than 800 km apart. Because faecal samples are readily attainable and representative of interpersonal differences in gut microbial ecology7, they were col- lected from each individual and frozen immediately. The collection procedure was repeated again with an average interval between sampling of 57 6 4 days. To characterize the bacterial lineages present in the faecal micro- biotas of these 154 individuals, we performed 16S rRNA sequencing, targeting the full-length gene with an ABI 3730xl capillary sequencer. Additionally, we performed multiplex pyrosequencing with a 454 FLX instrument to survey the gene���s V2 variable region13 and its V6 hypervariable region14 (Supplementary Tables 1���3). Complementary phylogenetic and taxon-based methods were used to compare 16S rRNA sequences among faecal communities (see Methods). No matter which region of the gene was examined, individuals from the same family (a twin and her co-twin, or twins and their mother) had a more similar bacterial community structure than unrelated individuals (Fig. 1a and Supplementary Fig. 1a, b), and shared significantly more species-level phylotypes (16S rRNA sequences with $97% identity comprise each phylotype) (G 5 55.2, P , 10212 (V2) G 5 12.3, P , 0.001 (V6) G 5 11.3, P , 0.001 (full-length)). No significant correlation was seen between the degree of physical separation of family members��� current homes and the degree of similarity between their microbial communities (defined by UniFrac15). The observed familial similarity was not due to an indirect effect of the physiological states of obesity versus lean- ness similar results were observed after stratifying twin pairs and their mothers by BMI category (concordant lean or concordant obese individuals Supplementary Fig. 2). Surprisingly, there was no sig- nificant difference in the degree of similarity in the gut microbiotas of adult monozygotic compared with dizygotic twin pairs (Fig. 1a). However, we could not assess whether monozygotic and dizygotic twin pairs had different degrees of similarities at earlier stages of their lives. Multiplex pyrosequencing of V2 and V6 amplicons allowed higher levels of coverage compared with what was feasible using Sanger sequencing, reaching on average 3,984 6 232 (V2) and 24,786 6 1,403 (V6) sequences per sample. To control for differences 1 Center for Genome Sciences. 2 Department of Psychiatry, Washington University School of Medicine, St Louis, Missouri 63108, USA. 3 Department of Computer Science. 4 Department of Chemistry and Biochemistry, University of Colorado, Boulder, Colorado 80309, USA. 5CNRS, UMR6098, Marseille, France. 6Josephine Bay Paul Center, Marine Biological Laboratory, Woods Hole, Massachusetts 02543, USA. 7Environmental Genomics Core Facility, University of South Carolina, Columbia, South Carolina 29208, USA. 8Department of Chemistry and Biochemistry and the Advanced Center for Genome Technology, University of Oklahoma, Norman, Oklahoma 73019, USA. 9454 Life Sciences, Branford, Connecticut 06405, USA. Vol 457|22 January 2009|doi:10.1038/nature07540 480 Macmillan Publishers Limited. All rights reserved ��2009
in coverage, all analyses were performed on an equal number of randomly selected sequences (200 full-length, 1,000 V2 and 10,000 V6). At this level of coverage, there was little overlap between the sampled faecal communities. Moreover, the number of 16S rRNA gene sequences belonging to each phylotype varied greatly between faecal microbiotas (Supplementary Tables 6���8). Because this apparent lack of overlap could reflect the level of coverage (Supplementary Tables 1���3), we subsequently searched all hosts for bacterial phylotypes present at high abundance using a sampling model based on a combination of standard Poisson and binomial sampling statistics. The analysis allowed us to conclude that no phylotype was present at more than about 0.5% abundance in all of the samples in this study (see Supplementary Results). Finally, we sub-sampled our data set by randomly selecting 50���3,000 sequences per sample again, no phylotypes were detectable in all individuals sampled within this range of coverage (Supplementary Fig. 3). Samples taken from the same individual at the initial collection point and 57 6 4 days later were consistent with respect to the specific phylotypes found (Supplementary Figs 4 and 5), but showed varia- tions in relative abundance of the major gut bacterial phyla (Supplementary Fig. 6). There was no significant association between UniFrac distance and the time between sample collections. Overall, faecal samples from the same individual were much more similar to one another than samples from family members or unrelated indi- viduals (Fig. 1a), demonstrating that short-term temporal changes in community structure within an individual are minor compared with inter-personal differences. Analysis of 16S rRNA data sets produced by the three PCR-based methods, plus shotgun sequencing of community DNA (see below), revealed a lower proportion of Bacteroidetes and a higher proportion of Actinobacteria in obese compared with lean individuals of both ancestries (Supplementary Table 9). Combining the individual P values across these independent analyses using Fisher���s method dis- closed significantly fewer Bacteroidetes (P 5 0.003), more Actinobacteria (P 5 0.002) but no significant difference in Firmicutes (P 5 0.09). These findings agree with previous work showing comparable differences in both taxa in mice2 and a progress- ive increase in the representation of Bacteroidetes when 12 unrelated, obese humans lost weight after being placed on one of two reduced- calorie diets6. Across all methods, obesity was associated with a significant decrease in the level of diversity (Fig. 1b and Supplementary Fig. 1c���f). This reduced diversity suggests an analogy: the obese gut microbiota is not like a rainforest or reef, which are adapted to high energy flux and are highly diverse rather, it may be more like a fertilizer runoff where a reduced-diversity microbial community blooms with abnormal energy input16. We subsequently characterized the microbial lineage and gene content of the faecal microbiomes of 18 individuals representing six of the families (three lean and three obese European ancestry monozygotic twin pairs and their mothers) through shotgun pyro- sequencing (Supplementary Tables 4 and 5) and BLASTX compar- isons against several databases (KEGG17 (version 44) and STRING18) plus a custom database of 44 reference human gut microbial genomes (Supplementary Figs 7���10 and Supplementary Results). Our analysis parameters were validated using control data sets comprising ran- domly fragmented microbial genes with annotations in the KEGG database17 (Supplementary Fig. 11 and Supplementary Methods). We also tested how technical advances that produce longer reads might improve these assignments by sequencing faecal community samples from one twin pair using Titanium pyrosequencing methods (average read length of 341 6 134 nucleotides (s.d.) versus 208 6 68 nucleotides for the standard FLX method). Supplementary Fig. 12 shows that the frequency and quality of sequence assignments is improved as read length increases from 200 to 350 nucleotides. The 18 microbiomes were searched to identify sequences matching domains from experimentally validated carbohydrate-active enzymes (CAZymes). Sequences matching 156 total CAZy families were found within at least one human gut microbiome, including 77 glycoside hydrolase, 21 carbohydrate-binding module, 35 glycosyl- transferase, 12 polysaccharide lyase and 11 carbohydrate-esterase families (Supplementary Table 10). On average, 2.62 6 0.13% of the sequences in the gut microbiome could be assigned to CAZymes (a total of 217,615 sequences), a percentage that is greater than the most abundant KEGG pathway (���Transporters��� 1.20 6 0.06% of the filtered sequences generated from each sample) and indicative of the abundant and diverse set of microbial genes directed towards accessing a wide range of polysaccharides. Category-based clustering of the functions from each microbiome was performed using principal components analysis (PCA) and hier- archical clustering19. Two distinct clusters of gut microbiomes were identified based on metabolic profile, corresponding to samples with an increased abundance of Firmicutes and Actinobacteria, and sam- ples with a high abundance of Bacteroidetes (Fig. 2a). A linear regres- sion of the first principal component (PC1, explaining 20% of the functional variance) and the relative abundance of the Bacteroidetes showed a highly significant correlation (R2 5 0.96, P , 10212 Fig. 2b). Functional profiles stabilized within each individual���s microbiome after 20,000 sequences had been accumulated (Supplementary Fig. 13). Family members had more similar profiles than unrelated individuals (Fig. 2c), suggesting that shared bacterial community structure (���who���s there��� based on 16S rRNA analyses) also translates into shared community-wide relative abundance of metabolic pathways. Accordingly, a direct comparison of functional b a * 0.66 0.68 0.70 0.72 0.74 0.76 0.78 0.80 0.82 Self Twin���twin Mono- zygotic Twin���mother Unrelated UniFrac distance Dizygotic Mono- zygotic Dizygotic 2 22 42 62 82 102 122 0 2,000 4,000 6,000 8,000 1,0000 Number of sequences Phylogenetic diversity Lean Obese * *** More similar More different * *** ** ** ns Figure 1 | 16S rRNA gene surveys reveal familial similarity and reduced diversity of the gut microbiota in obese individuals. a, Average unweighted UniFrac distance (a measure of differences in bacterial community structure) between individuals over time (self), twin pairs, twins and their mother, and unrelated individuals (1,000 sequences per V2 data set Student���s t-test with Monte Carlo *P , 1025 **P , 10214 ***P , 10241 mean 6 s.e.m.). b, Phylogenetic diversity curves for the microbiota of lean and obese individuals (based on 1���10,000 sequences per V6 data set mean 6 95% confidence intervals shown). NATURE|Vol 457|22 January 2009 LETTERS 481 Macmillan Publishers Limited. All rights reserved ��2009