Deciphering microbial gene function using natural language processing

20Citations
Citations of this article
100Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Revealing the function of uncharacterized genes is a fundamental challenge in an era of ever-increasing volumes of sequencing data. Here, we present a concept for tackling this challenge using deep learning methodologies adopted from natural language processing (NLP). We repurpose NLP algorithms to model “gene semantics” based on a biological corpus of more than 360 million microbial genes within their genomic context. We use the language models to predict functional categories for 56,617 genes and find that out of 1369 genes associated with recently discovered defense systems, 98% are inferred correctly. We then systematically evaluate the “discovery potential” of different functional categories, pinpointing those with the most genes yet to be characterized. Finally, we demonstrate our method’s ability to discover systems associated with microbial interaction and defense. Our results highlight that combining microbial genomics and language models is a promising avenue for revealing gene functions in microbes.

Cite

CITATION STYLE

APA

Miller, D., Stern, A., & Burstein, D. (2022). Deciphering microbial gene function using natural language processing. Nature Communications, 13(1). https://doi.org/10.1038/s41467-022-33397-4

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free