The choice of an 'optimal' mathematical model for computing evolutionary distances from real sequences is not currently supported by easy-to-use software applicable to large data sets, and an investigator frequently selects one of the simplest models available. Here we study properties of the observed proportion of differences (p-distance) between sequences as an estimator of evolutionary distance for tree-making. We show that p-distances allow for consistent tree-making with any of the popular methods working with evolutionary distances if evolution of sequences obeys a 'molecular clock' (more precisely, if it follows a stationary time reversible Markov model of nucleotide substitution). Next, we show that p-distances seem to be efficient in recovering the correct tree topology under a 'molecular clock,' but produce 'statistically supported' wrong trees when substitution rates vary among evolutionary lineages. Finally, we outline a practical approach for selecting an 'optimal' model of nucleotide substitution in a real data analysis, and obtain a crude estimate of a 'prior' distribution of the expected tree branch lengths under the Jukes-Cantor model. We conclude that the use of a model that is obviously oversimplified is inadvisable unless it is justified by a preliminary analysis of the real sequences.
CITATION STYLE
Rzhetsky, A., & Sitnikova, T. (1996). When is it safe to use an oversimplified substitution model in tree- making? Molecular Biology and Evolution, 13(9), 1255–1265. https://doi.org/10.1093/oxfordjournals.molbev.a025691
Mendeley helps you to discover research relevant for your work.