Phase Resolution of Heterozygous Sites in Diploid Genomes is Important to Phylogenomic Analysis under the Multispecies Coalescent Model

13Citations
Citations of this article
43Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Genome sequencing projects routinely generate haploid consensus sequences from diploid genomes, which are effectively chimeric sequences with the phase at heterozygous sites resolved at random. The impact of phasing errors on phylogenomic analyses under the multispecies coalescent (MSC) model is largely unknown. Here, we conduct a computer simulation to evaluate the performance of four phase-resolution strategies (the true phase resolution, the diploid analytical integration algorithm which averages over all phase resolutions, computational phase resolution using the program PHASE, and random resolution) on estimation of the species tree and evolutionary parameters in analysis of multilocus genomic data under the MSC model. We found that species tree estimation is robust to phasing errors when species divergences were much older than average coalescent times but may be affected by phasing errors when the species tree is shallow. Estimation of parameters under the MSC model with and without introgression is affected by phasing errors. In particular, random phase resolution causes serious overestimation of population sizes for modern species and biased estimation of cross-species introgression probability. In general, the impact of phasing errors is greater when the mutation rate is higher, the data include more samples per species, and the species tree is shallower with recent divergences. Use of phased sequences inferred by the PHASE program produced small biases in parameter estimates. We analyze two real data sets, one of East Asian brown frogs and another of Rocky Mountains chipmunks, to demonstrate that heterozygote phase-resolution strategies have similar impacts on practical data analyses. We suggest that genome sequencing projects should produce unphased diploid genotype sequences if fully phased data are too challenging to generate, and avoid haploid consensus sequences, which have heterozygous sites phased at random. In case the analytical integration algorithm is computationally unfeasible, computational phasing prior to population genomic analyses is an acceptable alternative. [BPP; introgression; multispecies coalescent; phase; species tree.]

References Powered by Scopus

Full-length transcriptome assembly from RNA-Seq data without a reference genome

15920Citations
N/AReaders
Get full text

Velvet: Algorithms for de novo short read assembly using de Bruijn graphs

7868Citations
N/AReaders
Get full text

A new statistical method for haplotype reconstruction from population data

6674Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Estimation of species divergence times in presence of cross-species gene flow

17Citations
N/AReaders
Get full text

Full-Likelihood Genomic Analysis Clarifies a Complex History of Species Divergence and Introgression: The Example of the erato-sara Group of Heliconius Butterflies

17Citations
N/AReaders
Get full text

Power of Bayesian and Heuristic Tests to Detect Cross-Species Introgression with Reference to Gene Flow in the Tamias quadrivittatus Group of North American Chipmunks

15Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Huang, J., Bennett, J., Flouri, T., Leaché, A. D., & Yang, Z. (2022). Phase Resolution of Heterozygous Sites in Diploid Genomes is Important to Phylogenomic Analysis under the Multispecies Coalescent Model. Systematic Biology, 71(2), 334–352. https://doi.org/10.1093/sysbio/syab047

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 15

68%

Researcher 5

23%

Professor / Associate Prof. 2

9%

Readers' Discipline

Tooltip

Agricultural and Biological Sciences 14

56%

Biochemistry, Genetics and Molecular Bi... 9

36%

Social Sciences 1

4%

Earth and Planetary Sciences 1

4%

Save time finding and organizing research with Mendeley

Sign up for free