A Fast, Reproducible, High-Throughput Variant Calling Workflow for Population Genomics

Cade D. Mirchandani; Allison J. Shultz; Gregg W.C. Thomas; Sara J. Smith; Mara Baylis; Brian Arnold; Russ Corbett-Detig; Erik Enbody; Timothy B. Sackton

Journal ArticleOPEN ACCESS

A Fast, Reproducible, High-Throughput Variant Calling Workflow for Population Genomics

Molecular Biology and Evolution (2024) 41(1)

DOI: 10.1093/molbev/msad270

9Citations

42Readers

Abstract

The increasing availability of genomic resequencing data sets and high-quality reference genomes across the tree of life present exciting opportunities for comparative population genomic studies. However, substantial challenges prevent the simple reuse of data across different studies and species, arising from variability in variant calling pipelines, data quality, and the need for computationally intensive reanalysis. Here, we present snpArcher, a flexible and highly efficient workflow designed for the analysis of genomic resequencing data in nonmodel organisms. snpArcher provides a standardized variant calling pipeline and includes modules for variant quality control, data visualization, variant filtering, and other downstream analyses. Implemented in Snakemake, snpArcher is user-friendly, reproducible, and designed to be compatible with high-performance computing clusters and cloud environments. To demonstrate the flexibility of this pipeline, we applied snpArcher to 26 public resequencing data sets from nonmammalian vertebrates. These variant data sets are hosted publicly to enable future comparative population genomic analyses. With its extensibility and the availability of public data sets, snpArcher will contribute to a broader understanding of genetic variation across species by facilitating the rapid use and reuse of large genomic data sets.

Author supplied keywords

Cite

CITATION STYLE

APA

Mirchandani, C. D., Shultz, A. J., Thomas, G. W. C., Smith, S. J., Baylis, M., Arnold, B., … Sackton, T. B. (2024). A Fast, Reproducible, High-Throughput Variant Calling Workflow for Population Genomics. Molecular Biology and Evolution, 41(1). https://doi.org/10.1093/molbev/msad270

A Fast, Reproducible, High-Throughput Variant Calling Workflow for Population Genomics

Abstract

Author supplied keywords

Cite

Register to see more suggestions