AlignBucket: A tool to speed up 'all-against-all' protein sequence alignments optimizing length constraints

3Citations
Citations of this article
20Readers
Mendeley users who have this article in their library.

Abstract

Motivation: The next-generation sequencing era requires reliable, fast and efficient approaches for the accurate annotation of the ever-increasing number of biological sequences and their variations. Transfer of annotation upon similarity search is a standard approach. The procedure of all-against-all protein comparison is a preliminary step of different available methods that annotate sequences based on information already present in databases. Given the actual volume of sequences, methods are necessary to pre-process data to reduce the time of sequence comparison. Results: We present an algorithm that optimizes the partition of a large volume of sequences (the whole database) into sets where sequence length values (in residues) are constrained depending on a bounded minimal and expected alignment coverage. The idea is to optimally group protein sequences according to their length, and then computing the all-against-all sequence alignments among sequences that fall in a selected length range. We describe a mathematically optimal solution and we show that our method leads to a 5-fold speed-up in real world cases.

Cite

CITATION STYLE

APA

Profiti, G., Fariselli, P., & Casadio, R. (2015). AlignBucket: A tool to speed up “all-against-all” protein sequence alignments optimizing length constraints. Bioinformatics, 31(23), 3841–3843. https://doi.org/10.1093/bioinformatics/btv451

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free