Speeding up all-against-all protein comparisons while maintaining sensitivity by considering subsequence-level homology

Lucas D. Wittwer; Ivana Pilĭzota; Adrian M. Altenhoff; Christophe Dessimoz

Journal ArticleOPEN ACCESS

Speeding up all-against-all protein comparisons while maintaining sensitivity by considering subsequence-level homology

PeerJ (2014) 2014(1)

DOI: 10.7717/peerj.607

5Citations

27Readers

Abstract

Orthology inference and other sequence analyses across multiple genomes typically start by performing exhaustive pairwise sequence comparisons, a process referred to as "all-against-all". As this process scales quadratically in terms of the number of sequences analysed, this step can become a bottleneck, thus limiting the number of genomes that can be simultaneously analysed. Here, we explored ways of speeding-up the all-against-all step while maintaining its sensitivity. By exploiting the transitivity of homology and, crucially, ensuring that homology is defined in terms of consistent protein subsequences, our proof-of-concept resulted in a 4× speedup while recovering >99.6% of all homologs identified by the full all-against-all procedure on empirical sequences sets. In comparison, state-of-the-art k-mer approaches are orders of magnitude faster but only recover 3-14% of all homologous pairs. We also outline ideas to further improve the speed and recall of the new approach. An open source implementation is provided as part of the OMA standalone software at http://omabrowser.org/standalone.

Author supplied keywords

Cite

CITATION STYLE

APA

Wittwer, L. D., Pilĭzota, I., Altenhoff, A. M., & Dessimoz, C. (2014). Speeding up all-against-all protein comparisons while maintaining sensitivity by considering subsequence-level homology. PeerJ, 2014(1). https://doi.org/10.7717/peerj.607

Speeding up all-against-all protein comparisons while maintaining sensitivity by considering subsequence-level homology

Abstract

Author supplied keywords

Cite

Register to see more suggestions