Misleading results of likelihood-based phylogenetic analyses in the presence of missing data

89Citations
Citations of this article
144Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

The amount of missing data in many contemporary phylogenetic analyses has substantially increased relative to previous norms, particularly in supermatrix studies that compile characters from multiple previous analyses. In such cases the missing data are non-randomly distributed and usually present in all partitions (i.e. groups of characters) sampled. Parametric methods often provide greater resolution and support than parsimony in such cases, yet this may be caused by extrapolation of branch lengths from one partition to another. In this study I use contrived and simulated examples to demonstrate that likelihood, even when applied to simple matrices with little or no homoplasy, homogeneous evolution across groups of characters, perfect model fit, and hundreds or thousands of variable characters, can provide strong support for incorrect topologies when the matrices have non-random distributions of missing data distributed across all partitions. I do so using a systematic exploration of alternative seven-taxon tree topologies and distributions of missing data in two partitions to demonstrate that these likelihood-based artefacts may occur frequently and are not shared by parsimony. I also demonstrate that Bayesian Markov chain Monte Carlo analysis is more robust to these artefacts than is likelihood. © 2011 The Willi Hennig Society.

Cite

CITATION STYLE

APA

Simmons, M. P. (2012). Misleading results of likelihood-based phylogenetic analyses in the presence of missing data. Cladistics, 28(2), 208–222. https://doi.org/10.1111/j.1096-0031.2011.00375.x

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free