A set-theoretic approach to database searching and clustering

44Citations
Citations of this article
17Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Motivation: In this paper, we introduce an iterative method of database searching and apply it to design a database clustering algorithm applicable to an entire protein database. The clustering procedure relies on the quality of the database searching routine and further improves its results based on a set-theoretic analysis of a highly redundant yet efficient to generate cluster system. Results: Overall, we achieve unambiguous assignment of 80% of SWISS-PROT sequences to non-overlapping sequence clusters in an entirely automatic fashion. Our results are compared to an expert-generated clustering for validation. The database searching method is fast and the clustering technique does not require time-consuming all-against-all comparison. This allows for fast clustering of large amounts of sequences. Availability: The resulting clustering for the PIR1 (Release 51) and SWISS-PROT (Release 34) databases is available over the Internet from http://www.dkfz-heidelberg.de/tbi/services/modest/browsesysters.pl. Contact: a.krause@@@dkfz-heidelberg.de; m.vingron@@@dkfz-heidelberg.de.

Cite

CITATION STYLE

APA

Krause, A., & Vingron, M. (1998). A set-theoretic approach to database searching and clustering. Bioinformatics, 14(5), 430–438. https://doi.org/10.1093/bioinformatics/14.5.430

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free