Clustering Protein Sequences Using Affinity Propagation Based on an Improved Similarity Measure

Fan Yang; Qing Xin Zhu; Dong Ming Tang; Ming Yuan Zhao

Journal ArticleOPEN ACCESS

Clustering Protein Sequences Using Affinity Propagation Based on an Improved Similarity Measure

Evolutionary Bioinformatics (2009) 5

DOI: 10.4137/EBO.S3267

4Citations

27Readers

Abstract

The sizes of the protein databases are growing rapidly nowadays, thus it becomes increasingly important to cluster protein sequences only based on sequence information. In this paper we improve the similarity measure proposed by Kelil et al, then cluster sequences using the Affinity propagation (AP) algorithm and provide a method to decide the input preference of AP algorithm. We tested our method extensively and compared its performance with other four methods on several datasets of COG, G protein, CAZy, SCOP database. We consistently observed that, the number of clusters that we obtained for a given set of proteins approximate to the correct number of clusters in that set. Moreover, in our experiments, the quality of the clusters when quantified by F-measure was better than that of other algorithms (on average, it is 15% better than that of BlastClust, 56% better than that of TribeMCL, 23% better than that of CLUSS, and 42% better than that of Spectral clustering).

Author supplied keywords

Cite

CITATION STYLE

APA

Yang, F., Zhu, Q. X., Tang, D. M., & Zhao, M. Y. (2009). Clustering Protein Sequences Using Affinity Propagation Based on an Improved Similarity Measure. Evolutionary Bioinformatics, 5. https://doi.org/10.4137/EBO.S3267

Clustering Protein Sequences Using Affinity Propagation Based on an Improved Similarity Measure

Abstract

Author supplied keywords

Cite

Register to see more suggestions