GBC: a parallel toolkit based on highly addressable byte-encoding blocks for extremely large-scale genotypes of species

Liubin Zhang; Yangyang Yuan; Wenjie Peng; Bin Tang; Mulin Jun Li; Hongsheng Gui; Qiang Wang; Miaoxin Li

Journal ArticleOPEN ACCESS

GBC: a parallel toolkit based on highly addressable byte-encoding blocks for extremely large-scale genotypes of species

Genome Biology (2023) 24(1)

DOI: 10.1186/s13059-023-02906-z

0Citations

3Readers

Abstract

Whole-genome sequencing projects of millions of subjects contain enormous genotypes, entailing a huge memory burden and time for computation. Here, we present GBC, a toolkit for rapidly compressing large-scale genotypes into highly addressable byte-encoding blocks under an optimized parallel framework. We demonstrate that GBC is up to 1000 times faster than state-of-the-art methods to access and manage compressed large-scale genotypes while maintaining a competitive compression ratio. We also showed that conventional analysis would be substantially sped up if built on GBC to access genotypes of a large population. GBC’s data structure and algorithms are valuable for accelerating large-scale genomic research.

Author supplied keywords

Cite

CITATION STYLE

APA

Zhang, L., Yuan, Y., Peng, W., Tang, B., Li, M. J., Gui, H., … Li, M. (2023). GBC: a parallel toolkit based on highly addressable byte-encoding blocks for extremely large-scale genotypes of species. Genome Biology, 24(1). https://doi.org/10.1186/s13059-023-02906-z

GBC: a parallel toolkit based on highly addressable byte-encoding blocks for extremely large-scale genotypes of species

Abstract

Author supplied keywords

Cite

Register to see more suggestions