When is it safe to use an oversimplified substitution model in tree- making?

22Citations
Citations of this article
40Readers
Mendeley users who have this article in their library.

Abstract

The choice of an 'optimal' mathematical model for computing evolutionary distances from real sequences is not currently supported by easy-to-use software applicable to large data sets, and an investigator frequently selects one of the simplest models available. Here we study properties of the observed proportion of differences (p-distance) between sequences as an estimator of evolutionary distance for tree-making. We show that p-distances allow for consistent tree-making with any of the popular methods working with evolutionary distances if evolution of sequences obeys a 'molecular clock' (more precisely, if it follows a stationary time reversible Markov model of nucleotide substitution). Next, we show that p-distances seem to be efficient in recovering the correct tree topology under a 'molecular clock,' but produce 'statistically supported' wrong trees when substitution rates vary among evolutionary lineages. Finally, we outline a practical approach for selecting an 'optimal' model of nucleotide substitution in a real data analysis, and obtain a crude estimate of a 'prior' distribution of the expected tree branch lengths under the Jukes-Cantor model. We conclude that the use of a model that is obviously oversimplified is inadvisable unless it is justified by a preliminary analysis of the real sequences.

Cite

CITATION STYLE

APA

Rzhetsky, A., & Sitnikova, T. (1996). When is it safe to use an oversimplified substitution model in tree- making? Molecular Biology and Evolution, 13(9), 1255–1265. https://doi.org/10.1093/oxfordjournals.molbev.a025691

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free