Alignment-free methods are increasingly used to calculate evolutionary distances between DNA and protein sequences as a basis of phylogeny reconstruction. Most of these methods, however, use heuristic distance functions that are not based on any explicit model of molecular evolution. Herein, we propose a simple estimator dN of the evolutionary distance between two DNA sequences that is calculated from the number N of (spaced) word matches between them. We show that this distance function is more accurate than other distance measures that are used by alignment-free methods. In addition, we calculate the variance of the normalized number N of (spaced) word matches. We show that the variance of N is smaller for spaced words than for contiguous words, and that the variance is further reduced if our spaced-words approach is used with multiple patterns of 'match positions' and 'don't care positions'.
CITATION STYLE
Morgenstern, B., Zhu, B., Horwege, S., & Leimeister, A. A. (2015). Estimating evolutionary distances between genomic sequences from spaced-word matches. Algorithms for Molecular Biology, 10(1). https://doi.org/10.1186/s13015-015-0032-x
Mendeley helps you to discover research relevant for your work.