Abstract
The evolutionary history of a set of species is represented by a phylogenetic tree, in other words, by a rooted, leaf-labelled tree, where internal nodes represent ancestral species and the leaves represent modern day species. Accurate (or even boundedly inaccurate) topology reconstructions of large and divergent trees has long been considered one of the major challenges in systematic biology. None of the polynomial time methods developed by the theoretical computer science community has been shown to outperform the popular Neighbor-Joining method used by systematic biologists, with respect to topology estimation. (However, preliminary experiments indicate that two new variants of Neighbor-Joining, Bio-NJ and Weighbor, do exhibit improved performance.) In this paper, we present a simple polynomial time method, the Disk-Covering Method (DCM), which boosts the performance of base phylogenetic methods. We analyze the performance of DCM-boosted distance methods under the general Markov model of evolution, and prove that, by using the DCM-boosted Buneman method, for almost all trees, polylogarithmic length sequences suffice for complete accuracy with high probability, while polynomial length sequences always suffice. Our experimental study (based upon simulating sequence evolution on model trees, generating about 1000 datasets) confirms these substantial reductions in error rates and extremely fast convergence rates. In particular, we report that DCM-boosted Neighbor-Joining has only 8% of the error of Neighbor-Joining under conditions that are adverse to Neighbor-Joining, and on some trees achieving acceptable error rates (less than 5% error in the topology estimation) from sequences of a few hundred nucleotides, while Neighbor-Joining needs more than 10 K nucleotides to achieve the same level of accuracy.
Cite
CITATION STYLE
Huson, D. H., Nettles, S., & Warnow, T. J. (1999). Obtaining highly accurate topology estimates of evolutionary trees from very short sequences. Proceedings of the Annual International Conference on Computational Molecular Biology, RECOMB, 198–207. https://doi.org/10.1145/299432.299484
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.