ECOD domain classification of 48 whole proteomes from AlphaFold Structure Database using DPAM2

R. Dustin Schaeffer; Jing Zhang; Kirill E. Medvedev; Lisa N. Kinch; Qian Cong; Nick V. Grishin

Journal ArticleOPEN ACCESS

ECOD domain classification of 48 whole proteomes from AlphaFold Structure Database using DPAM2

PLoS Computational Biology (2024) 20(2)

DOI: 10.1371/journal.pcbi.1011586

16Citations

15Readers

Get full text

Abstract

Protein structure prediction has now been deployed widely across several different large protein sets. Large-scale domain annotation of these predictions can aid in the development of biological insights. Using our Evolutionary Classification of Protein Domains (ECOD) from experimental structures as a basis for classification, we describe the detection and cataloging of domains from 48 whole proteomes deposited in the AlphaFold Database. On average, we can provide positive classification (either of domains or other identifiable non-domain regions) for 90% of residues in all proteomes. We classified 746,349 domains from 536,808 proteins comprised of over 226,424,000 amino acid residues. We examine the varying populations of homologous groups in both eukaryotes and bacteria. In addition to containing a higher fraction of disordered regions and unassigned domains, eukaryotes show a higher proportion of repeated proteins, both globular and small repeats. We enumerate those highly populated domains that are shared in both eukaryotes and bacteria, such as the Rossmann domains, TIM barrels, and P-loop domains. Additionally, we compare the sampling of homologous groups from this whole proteome set against our stable ECOD reference and discuss groups that have been enriched by structure predictions. Finally, we discuss the implication of these results for protein target selection for future classification strategies for very large protein sets.

Cite

CITATION STYLE

APA

Schaeffer, R. D., Zhang, J., Medvedev, K. E., Kinch, L. N., Cong, Q., & Grishin, N. V. (2024). ECOD domain classification of 48 whole proteomes from AlphaFold Structure Database using DPAM2. PLoS Computational Biology, 20(2). https://doi.org/10.1371/journal.pcbi.1011586

ECOD domain classification of 48 whole proteomes from AlphaFold Structure Database using DPAM2

Abstract

Cite

Register to see more suggestions