Estimating DNA polymorphism from next generation sequencing data with high error rate by dual sequencing applications

11Citations
Citations of this article
42Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Background: As the error rate is high and the distribution of errors across sites is non-uniform in next generation sequencing (NGS) data, it has been a challenge to estimate DNA polymorphism (θ) accurately from NGS data.Results: By computer simulations, we compare the two methods of data acquisition - sequencing each diploid individual separately and sequencing the pooled sample. Under the current NGS error rate, sequencing each individual separately offers little advantage unless the coverage per individual is high (>20X). We hence propose a new method for estimating θ from pooled samples that have been subjected to two separate rounds of DNA sequencing. Since errors from the two sequencing applications are usually non-overlapping, it is possible to separate low frequency polymorphisms from sequencing errors. Simulation results show that the dual applications method is reliable even when the error rate is high and θ is low.Conclusions: In studies of natural populations where the sequencing coverage is usually modest (~2X per individual), the dual applications method on pooled samples should be a reasonable choice. © 2013 He et al.; licensee BioMed Central Ltd.

Cite

CITATION STYLE

APA

He, Z., Li, X., Ling, S., Fu, Y. X., Hungate, E., Shi, S., & Wu, C. I. (2013). Estimating DNA polymorphism from next generation sequencing data with high error rate by dual sequencing applications. BMC Genomics, 14(1). https://doi.org/10.1186/1471-2164-14-535

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free