Large scale clustering of protein sequences with FORCE -A layout based heuristic for weighted cluster editing

Tobias Wittkop; Jan Baumbach; Francisco P. Lobo; Sven Rahmann

Journal ArticleOPEN ACCESS

Large scale clustering of protein sequences with FORCE -A layout based heuristic for weighted cluster editing

BMC Bioinformatics (2007) 8

DOI: 10.1186/1471-2105-8-396

63Citations

74Readers

Abstract

Background: Detecting groups of functionally related proteins from their amino acid sequence alone has been a long-standing challenge in computational genome research. Several clustering approaches, following different strategies, have been published to attack this problem. Today, new sequencing technologies provide huge amounts of sequence data that has to be efficiently clustered with constant or increased accuracy, at increased speed. Results: We advocate that the model of weighted cluster editing, also known as transitive graph projection is well-suited to protein clustering. We present the FORCE heuristic that is based on transitive graph projection and clusters arbitrary sets of objects, given pairwise similarity measures. In particular, we apply FORCE to the problem of protein clustering and show that it outperforms the most popular existing clustering tools (Spectral clustering, TribeMCL, GeneRAGE, Hierarchical clustering, and Affinity Propagation). Furthermore, we show that FORCE is able to handle huge datasets by calculating clusters for all 192 187 prokaryotic protein sequences (66 organisms) obtained from the COG database. Finally, FORCE is integrated into the corynebacterial reference database CoryneRegNet. Conclusion: FORCE is an applicable alternative to existing clustering algorithms. Its theoretical foundation, weighted cluster editing, can outperform other clustering paradigms on protein homology clustering. FORCE is open source and implemented in Java. The software, including the source code, the clustering results for COG and CoryneRegNet, and all evaluation datasets are available at http://gi.cebitec.uni-bielefeld.de/comet/force/. © 2007 Wittkop et al; licensee BioMed Central Ltd.

Figures

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Wittkop, T., Baumbach, J., Lobo, F. P., & Rahmann, S. (2007). Large scale clustering of protein sequences with FORCE -A layout based heuristic for weighted cluster editing. BMC Bioinformatics, 8. https://doi.org/10.1186/1471-2105-8-396

Readers over time

Readers' Seniority

PhD / Post grad / Masters / Doc 41

58%

Researcher 21

30%

Professor / Associate Prof. 9

13%

Readers' Discipline

Agricultural and Biological Sciences 35

51%

Computer Science 25

37%

Biochemistry, Genetics and Molecular Bi... 7

10%

Chemical Engineering 1

Large scale clustering of protein sequences with FORCE -A layout based heuristic for weighted cluster editing

Abstract

Figures

References Powered by Scopus

Gapped BLAST and PSI-BLAST: A new generation of protein database search programs

Clustering by passing messages between data points

Graph drawing by force‐directed placement

Cited by Powered by Scopus

ClusterMaker: A multi-algorithm clustering plugin for Cytoscape

Ultra-fast sequence clustering from similarity networks with SiLiX

Comparing the performance of biomedical clustering methods

Register to see more suggestions

Cite

Readers over time

Readers' Seniority

Readers' Discipline