Learning a deep language model for microbiomes: The power of large scale unlabeled microbiome data

9Citations
Citations of this article
45Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We use open source human gut microbiome data to learn a microbial “language” model by adapting techniques from Natural Language Processing (NLP). Our microbial “language” model is trained in a self-supervised fashion (i.e., without additional external labels) to capture the interactions among different microbial taxa and the common compositional patterns in microbial communities. The learned model produces contextualized taxon representations that allow a single microbial taxon to be represented differently according to the specific microbial environment in which it appears. The model further provides a sample representation by collectively interpreting different microbial taxa in the sample and their interactions as a whole. We demonstrate that, while our sample representation performs comparably to baseline models in in-domain prediction tasks such as predicting Irritable Bowel Disease (IBD) and diet patterns, it significantly outperforms them when generalizing to test data from independent studies, even in the presence of substantial distribution shifts. Through a variety of analyses, we further show that the pretrained, context-sensitive embedding captures meaningful biological information, including taxonomic relationships, correlations with biological pathways, and relevance to IBD expression, despite the model never being explicitly exposed to such signals.

Cite

CITATION STYLE

APA

Pope, Q., Varma, R., Tataru, C., David, M. M., & Fern, X. (2025). Learning a deep language model for microbiomes: The power of large scale unlabeled microbiome data. PLoS Computational Biology, 21(5 May). https://doi.org/10.1371/journal.pcbi.1011353

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free