Conditioned genome reconstruction: How to avoid choosing the conditioning genome

Matthew Spencer; David Bryant; Edward Susko

Journal ArticlePDF AVAILABLE

Conditioned genome reconstruction: How to avoid choosing the conditioning genome

Systematic Biology (2007) 56(1) 25-43

DOI: 10.1080/10635150601156313

13Citations

27Readers

Abstract

Genome phylogenies can be inferred from data on the presence and absence of genes across taxa. Logdet distances may be a good method, because they allow expected genome size to vary across the tree. Recently, Lake and Rivera proposed conditioned genome reconstruction (calculation of logdet distances using only those genes present in a conditioning genome) to deal with unobservable genes that are absent from every taxon of interest. We prove that their method can consistently estimate the topology for almost any choice of conditioning genome. Nevertheless, the choice of conditioning genome is important for small samples. For real bacterial genome data, different choices of conditioning genome can result in strong bootstrap support for different tree topologies. To overcome this problem, we developed supertree methods that combine information from all choices of conditioning genome. One of these methods, based on the BIONJ algorithm, performs well on simulated data and may have applications to other supertree problems. However, an analysis of 40 bacterial genomes using this method supports an incorrect clade of parasites. This is a common feature of model-based gene content methods and is due to parallel gene loss. Copyright © Society of Systematic Biologists.

Author supplied keywords

Cite

CITATION STYLE

APA

Spencer, M., Bryant, D., & Susko, E. (2007). Conditioned genome reconstruction: How to avoid choosing the conditioning genome. Systematic Biology, 56(1), 25–43. https://doi.org/10.1080/10635150601156313

Conditioned genome reconstruction: How to avoid choosing the conditioning genome

Abstract

Author supplied keywords

Cite

Register to see more suggestions