Unsupervised Deep Learning Can Identify Protein Functional Groups from Unaligned Sequences

2Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.

Abstract

Interpreting protein function from sequence data is a fundamental goal of bioinformatics. However, our current understanding of protein diversity is bottlenecked by the fact that most proteins have only been functionally validated in model organisms, limiting our understanding of how function varies with gene sequence diversity. Thus, accuracy of inferences in clades without model representatives is questionable. Unsupervised learning may help to ameliorate this bias by identifying highly complex patterns and structure from large data sets without external labels. Here, we present DeepSeqProt, an unsupervised deep learning program for exploring large protein sequence data sets. DeepSeqProt is a clustering tool capable of distinguishing between broad classes of proteins while learning local and global structure of functional space. DeepSeqProt is capable of learning salient biological features from unaligned, unannotated sequences. DeepSeqProt is more likely to capture complete protein families and statistically significant shared ontologies within proteomes than other clustering methods. We hope this framework will prove of use to researchers and provide a preliminary step in further developing unsupervised deep learning in molecular biology.

Cite

CITATION STYLE

APA

David, K. T., & Halanych, K. M. (2023). Unsupervised Deep Learning Can Identify Protein Functional Groups from Unaligned Sequences. Genome Biology and Evolution, 15(5). https://doi.org/10.1093/gbe/evad084

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free