Fast alignment-free sequence comparison using spaced-word frequencies

Chris Andre Leimeister; Marcus Boden; Sebastian Horwege; Sebastian Lindner; Burkhard Morgenstern

Journal ArticleOPEN ACCESS

Fast alignment-free sequence comparison using spaced-word frequencies

Bioinformatics (2014) 30(14) 1991-1999

DOI: 10.1093/bioinformatics/btu177

102Citations

142Readers

Abstract

Motivation: Alignment-free methods for sequence comparison are increasingly used for genome analysis and phylogeny reconstruction; they circumvent various difficulties of traditional alignment-based approaches. In particular, alignment-free methods are much faster than pairwise or multiple alignments. They are, however, less accurate than methods based on sequence alignment. Most alignment-free approaches work by comparing the word composition of sequences. A well-known problem with these methods is that neighbouring word matches are far from independent. Results: To reduce the statistical dependency between adjacent word matches, we propose to use 'spaced words', defined by patterns of 'match' and 'don't care' positions, for alignment-free sequence comparison. We describe a fast implementation of this approach using recursive hashing and bit operations, and we show that further improvements can be achieved by using multiple patterns instead of single patterns. To evaluate our approach, we use spaced-word frequencies as a basis for fast phylogeny reconstruction. Using real-world and simulated sequence data, we demonstrate that our multiple-pattern approach produces better phylogenies than approaches relying on contiguous words. © 2014 The Author 2014.

Cite

CITATION STYLE

APA

Leimeister, C. A., Boden, M., Horwege, S., Lindner, S., & Morgenstern, B. (2014). Fast alignment-free sequence comparison using spaced-word frequencies. Bioinformatics, 30(14), 1991–1999. https://doi.org/10.1093/bioinformatics/btu177

Fast alignment-free sequence comparison using spaced-word frequencies

Abstract

Cite

Register to see more suggestions