Genome phylogenetic analysis based on extended gene contents

Xun Gu; Hongmei Zhang

Journal ArticleOPEN ACCESS

Genome phylogenetic analysis based on extended gene contents

Molecular Biology and Evolution (2004) 21(7) 1401-1408

DOI: 10.1093/molbev/msh138

46Citations

49Readers

Abstract

With the rapid growth of entire genome data, whole-genome approaches such as gene content become popular for genome phylogeny inference, including the tree of life. However, the underlying model for genome evolution is unclear, and the proposed (ad hoc) genome distance measure may violate the additivity. In this article, we formulate a stochastic framework for genome evolution, which provides a basis for defining an additive genome distance. However, we show that it is difficult to utilize the typical gene content data-i.e., the presence or absence of gene families across genomes-to estimate the genome distance. We solve this problem by introducing the concept of extended gene content; that is, the status of a gene family in a given genome could be absence, presence as single copy, or presence as duplicates, any of which can be used to estimate the genome distance and phylogenetic inference. Computer simulation shows that the new tree-making method is efficient, consistent, and fairly robust. The example of 35 microbial complete genomes demonstrates that it is useful not only to study the universal tree of life but also to explore the evolutionary pattern of genomes.

Author supplied keywords

Cite

CITATION STYLE

APA

Gu, X., & Zhang, H. (2004). Genome phylogenetic analysis based on extended gene contents. Molecular Biology and Evolution, 21(7), 1401–1408. https://doi.org/10.1093/molbev/msh138

Genome phylogenetic analysis based on extended gene contents

Abstract

Author supplied keywords

Cite

Register to see more suggestions