Motivation: Generation of genotype data has been growing exponentially over the last decade. With the large size of recent datasets comes a storage and computational burden with ever increasing costs. To reduce this burden, we propose XSI, a file format with reduced storage footprint that also allows computation on the compressed data and we show how this can improve future analyses. Results: We show that xSqueezeIt (XSI) allows for a file size reduction of 4-20× compared with compressed BCF and demonstrate its potential for 'compressive genomics' on the UK Biobank whole-genome sequencing genotypes with 8× faster loading times, 5× faster run of homozygozity computation, 30× faster dot products computation and 280× faster allele counts.
CITATION STYLE
Wertenbroek, R., Rubinacci, S., Xenarios, I., Thoma, Y., & Delaneau, O. (2022). XSI - a genotype compression tool for compressive genomics in large biobanks. Bioinformatics, 38(15), 3778–3784. https://doi.org/10.1093/bioinformatics/btac413
Mendeley helps you to discover research relevant for your work.