Benchmarking datasets for assembly-based variant calling using high-fidelity long reads

8Citations
Citations of this article
15Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Background: Recent advances in long-read sequencing technologies have enabled accurate identification of all genetic variants in individuals or cells; this procedure is known as variant calling. However, benchmarking studies on variant calling using different long-read sequencing technologies are still lacking. Results: We used two Caenorhabditis elegans strains to measure several variant calling metrics. These two strains shared true-positive genetic variants that were introduced during strain generation. In addition, both strains contained common and distinguishable variants induced by DNA damage, possibly leading to false-positive estimation. We obtained accurate and noisy long reads from both strains using high-fidelity (HiFi) and continuous long-read (CLR) sequencing platforms, and compared the variant calling performance of the two platforms. HiFi identified a 1.65-fold higher number of true-positive variants on average, with 60% fewer false-positive variants, than CLR did. We also compared read-based and assembly-based variant calling methods in combination with subsampling of various sequencing depths and demonstrated that variant calling after genome assembly was particularly effective for detection of large insertions, even with 10 × sequencing depth of accurate long-read sequencing data. Conclusions: By directly comparing the two long-read sequencing technologies, we demonstrated that variant calling after genome assembly with 10 × or more depth of accurate long-read sequencing data allowed reliable detection of true-positive variants. Considering the high cost of HiFi sequencing, we herein propose appropriate methodologies for performing cost-effective and high-quality variant calling: 10 × assembly-based variant calling. The results of the present study may facilitate the development of methods for identifying all genetic variants at the population level.

Cite

CITATION STYLE

APA

Lee, H., Kim, J., & Lee, J. (2023). Benchmarking datasets for assembly-based variant calling using high-fidelity long reads. BMC Genomics, 24(1). https://doi.org/10.1186/s12864-023-09255-y

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free