Large scale hierarchical clustering of protein sequences

Antje Krause; Jens Stoye; Martin Vingron

Journal ArticleOPEN ACCESS

Large scale hierarchical clustering of protein sequences

BMC Bioinformatics (2005) 6

DOI: 10.1186/1471-2105-6-15

62Citations

59Readers

Abstract

Background: Searching a biological sequence database with a query sequence looking for homologues has become a routine operation in computational biology. In spite of the high degree of sophistication of currently available search routines it is still virtually impossible to identify quickly and clearly a group of sequences that a given query sequence belongs to. Results: We report on our developments in grouping all known protein sequences hierarchically into superfamily and family clusters. Our graph-based algorithms take into account the topology of the sequence space induced by the data itself to construct a biologically meaningful partitioning. We have applied our clustering procedures to a non-redundant set of about 1,000,000 sequences resulting in a hierarchical clustering which is being made available for querying and browsing at 〈http://systers.molgen.mpg.de/〉. Conclusions: Comparisons with other widely used clustering methods on various data sets show the abilities and strengths of our clustering methods in producing a biologically meaningful grouping of protein sequences. © 2005 Krause et al., licensee BioMed Central Ltd.

Cite

CITATION STYLE

APA

Krause, A., Stoye, J., & Vingron, M. (2005). Large scale hierarchical clustering of protein sequences. BMC Bioinformatics, 6. https://doi.org/10.1186/1471-2105-6-15

Large scale hierarchical clustering of protein sequences

Abstract

Cite

Register to see more suggestions