A sequence-based evolutionary distance method for Phylogenetic analysis of highly divergent proteins

Wei Cao; Lu Yun Wu; Xia Yu Xia; Xiang Chen; Zhi Xin Wang; Xian Ming Pan

Journal ArticleOPEN ACCESS

A sequence-based evolutionary distance method for Phylogenetic analysis of highly divergent proteins

Scientific Reports (2023) 13(1)

DOI: 10.1038/s41598-023-47496-9

1Citations

8Readers

Abstract

Because of the limited effectiveness of prevailing phylogenetic methods when applied to highly divergent protein sequences, the phylogenetic analysis problem remains challenging. Here, we propose a sequence-based evolutionary distance algorithm termed sequence distance (SD), which innovatively incorporates site-to-site correlation within protein sequences into the distance estimation. In protein superfamilies, SD can effectively distinguish evolutionary relationships both within and between protein families, producing phylogenetic trees that closely align with those based on structural information, even with sequence identity less than 20%. SD is highly correlated with the similarity of the protein structure, and can calculate evolutionary distances for thousands of protein pairs within seconds using a single CPU, which is significantly faster than most protein structure prediction methods that demand high computational resources and long run times. The development of SD will significantly advance phylogenetics, providing researchers with a more accurate and reliable tool for exploring evolutionary relationships.

Cite

CITATION STYLE

APA

Cao, W., Wu, L. Y., Xia, X. Y., Chen, X., Wang, Z. X., & Pan, X. M. (2023). A sequence-based evolutionary distance method for Phylogenetic analysis of highly divergent proteins. Scientific Reports, 13(1). https://doi.org/10.1038/s41598-023-47496-9

A sequence-based evolutionary distance method for Phylogenetic analysis of highly divergent proteins

Abstract

Cite

Register to see more suggestions