SeqArray-a storage-efficient high-performance data format for WGS variant calls

78Citations
Citations of this article
113Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Motivation: Whole-genome sequencing (WGS) data are being generated at an unprecedented rate. Analysis of WGS data requires a flexible data format to store the different types of DNA variation. Variant call format (VCF) is a general text-based format developed to store variant genotypes and their annotations. However, VCF files are large and data retrieval is relatively slow. Here we introduce a new WGS variant data format implemented in the R/Bioconductor package 'SeqArray' for storing variant calls in an array-oriented manner which provides the same capabilities as VCF, but with multiple high compression options and data access using high-performance parallel computing. Results: Benchmarks using 1000 Genomes Phase 3 data show file sizes are 14.0Gb (VCF), 12.3Gb (BCF, binary VCF), 3.5Gb (BGT) and 2.6Gb (SeqArray) respectively. Reading genotypes in the SeqArray package are two to three times faster compared with the htslib C library using BCF files. For the allele frequency calculation, the implementation in the SeqArray package is over 5 times faster than PLINK v1.9 with VCF and BCF files, and over 16 times faster than vcftools. When used in conjunction with R/Bioconductor packages, the SeqArray package provides users a flexible, feature-rich, high-performance programming environment for analysis of WGS variant data.

Cite

CITATION STYLE

APA

Zheng, X., Gogarten, S. M., Lawrence, M., Stilp, A., Conomos, M. P., Weir, B. S., … Levine, D. (2017). SeqArray-a storage-efficient high-performance data format for WGS variant calls. Bioinformatics, 33(15), 2251–2257. https://doi.org/10.1093/bioinformatics/btx145

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free