Large scale hierarchical clustering of protein sequences

62Citations
Citations of this article
59Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Background: Searching a biological sequence database with a query sequence looking for homologues has become a routine operation in computational biology. In spite of the high degree of sophistication of currently available search routines it is still virtually impossible to identify quickly and clearly a group of sequences that a given query sequence belongs to. Results: We report on our developments in grouping all known protein sequences hierarchically into superfamily and family clusters. Our graph-based algorithms take into account the topology of the sequence space induced by the data itself to construct a biologically meaningful partitioning. We have applied our clustering procedures to a non-redundant set of about 1,000,000 sequences resulting in a hierarchical clustering which is being made available for querying and browsing at 〈http://systers.molgen.mpg.de/〉. Conclusions: Comparisons with other widely used clustering methods on various data sets show the abilities and strengths of our clustering methods in producing a biologically meaningful grouping of protein sequences. © 2005 Krause et al., licensee BioMed Central Ltd.

Cite

CITATION STYLE

APA

Krause, A., Stoye, J., & Vingron, M. (2005). Large scale hierarchical clustering of protein sequences. BMC Bioinformatics, 6. https://doi.org/10.1186/1471-2105-6-15

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free