An optimized GATK4 pipeline for Plasmodium falciparum whole genome sequencing variant calling and analysis

3Citations
Citations of this article
30Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Background: Accurate variant calls from whole genome sequencing (WGS) of Plasmodium falciparum infections are crucial in malaria population genomics. Here a falciparum variant calling pipeline based on GATK version 4 (GATK4) was optimized and applied to 6626 public Illumina WGS samples. Methods: Control WGS and accurate PacBio assemblies of 10 laboratory strains were leveraged to optimize parameters that control the heterozygosity, local assembly region size, ploidy, mapping and base quality in both GATK HaplotypeCaller and GenotypeGVCFs. From these controls, a high-quality training dataset was generated to recalibrate the raw variant data. Results: On current high-quality samples (read length = 250 bp, insert size = 405–524 bp), the optimized pipeline shows improved sensitivity (86.6 ± 1.7% for SNPs and 82.2 ± 5.9% for indels) compared to the default GATK4 pipeline (77.7 ± 1.3% for SNPs; and 73.1 ± 5.1% for indels, adjusted P < 0.001) and previous variant calling with GATK version 3 (GATK3, 70.3 ± 3.0% for SNPs and 59.7 ± 5.8% for indels, adjusted P < 0.001). Its sensitivity on simulated mixed infection samples (80.8 ± 6.1% for SNPs and 78.3 ± 5.1% for indels) was again improved relative to default GATK4 (68.8 ± 6.0% for SNPs and 38.9 ± 0.7% for indels, adjusted, adjusted P < 0.001). Precision was high and comparable across all pipelines on each type of data tested. The resulting combination of high-quality SNPs and indels increases the resolution of local population population structure detection in sub-Saharan Africa. Finally, increasing ploidy improves the detection of drug resistance mutations and estimation of complexity of infection. Conclusions: Overall, this study provides an optimized falciparum GATK4 pipeline resource for variant calling which should help improve genomic studies of malaria.

References Powered by Scopus

Genome sequence of the human malaria parasite Plasmodium falciparum

3632Citations
N/AReaders
Get full text

Artemisinin resistance in Plasmodium falciparum malaria

2755Citations
N/AReaders
Get full text

A molecular marker of artemisinin-resistant Plasmodium falciparum malaria

1547Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Prevalence of mutations associated with artemisinin partial resistance and sulfadoxine–pyrimethamine resistance in 13 regions in Tanzania in 2021: a cross-sectional survey

2Citations
N/AReaders
Get full text

A comparative analysis reveals the genomic diversity among 8 Muscovy duck populations

0Citations
N/AReaders
Get full text

Long-Read Sequencing and De Novo Genome Assembly Pipeline of Two Plasmodium falciparum Clones (Pf3D7, PfW2) Using Only the PromethION Sequencer from Oxford Nanopore Technologies without Whole-Genome Amplification

0Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Niaré, K., Greenhouse, B., & Bailey, J. A. (2023). An optimized GATK4 pipeline for Plasmodium falciparum whole genome sequencing variant calling and analysis. Malaria Journal, 22(1). https://doi.org/10.1186/s12936-023-04632-0

Readers over time

‘23‘24‘2505101520

Readers' Seniority

Tooltip

Researcher 5

63%

PhD / Post grad / Masters / Doc 3

38%

Readers' Discipline

Tooltip

Agricultural and Biological Sciences 4

36%

Medicine and Dentistry 3

27%

Biochemistry, Genetics and Molecular Bi... 3

27%

Computer Science 1

9%

Save time finding and organizing research with Mendeley

Sign up for free
0