A comparison study of succinct data structures for use in GWAS

7Citations
Citations of this article
23Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Background: In recent years genetic data analysis has seen a rapid increase in the scale of data to be analyzed. Schadt et al (NRG 11:647-657, 2010) offered that with data sets approaching the petabyte scale, data related challenges such as formatting, management, and transfer are increasingly important topics which need to be addressed. The use of succinct data structures is one method of reducing physical size of a data set without the use of expensive compression techniques. In this work, we consider the use of 2- and 3-bit encoding schemes for genotype data. We compare the computational performance of allele or genotype counting algorithms utilizing genotype data encoded in both schemes.Results: We perform a comparison of 2- and 3-bit genotype encoding schemes for use in genotype counting algorithms. We find that there is a 20% overhead when building simple frequency tables from 2-bit encoded genotypes. However, building pairwise count tables for genome-wide epistasis is 1.0% more efficient.Conclusions: In this work, we were concerned with comparing the performance benefits and disadvantages of using more densely packed genotype data representations in Genome Wide Associations Studies (GWAS). We implemented a 2-bit encoding for genotype data, and compared it against a more commonly used 3-bit encoding scheme. We also developed a C++ library, libgwaspp, which offers these data structures, and implementations of several common GWAS algorithms. In general, the 2-bit encoding consumes less memory, and is slightly more efficient in some algorithms than the 3-bit encoding. © 2013 Putnam et al.; licensee BioMed Central Ltd.

Cite

CITATION STYLE

APA

Putnam, P. P., Zhang, G., & Wilsey, P. A. (2013). A comparison study of succinct data structures for use in GWAS. BMC Bioinformatics, 14(1). https://doi.org/10.1186/1471-2105-14-369

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free