BDBG: A bucket-based method for compressing genome sequencing data with dynamic de Bruijn graphs

0Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.

Abstract

Dramatic increases in data produced by next-generation sequencing (NGS) technologies demand data compression tools for saving storage space. However, effective and efficient data compression for genome sequencing data has remained an unresolved challenge in NGS data studies. In this paper, we propose a novel alignment-free and reference-free compression method, BdBG, which is the first to compress genome sequencing data with dynamic de Bruijn graphs based on the data after bucketing. Compared with existing de Bruijn graph methods, BdBG only stored a list of bucket indexes and bifurcations for the raw read sequences, and this feature can effectively reduce storage space. Experimental results on several genome sequencing datasets show the effectiveness of BdBG over three state-of-the-art methods. BdBG is written in python and it is an open source software distributed under the MIT license, available for download at https://github.com/rongjiewang/BdBG.

References Powered by Scopus

A Mathematical Theory of Communication

37396Citations
N/AReaders
Get full text

A Method for the Construction of Minimum-Redundancy Codes

4665Citations
N/AReaders
Get full text

A Universal Algorithm for Sequential Data Compression

3970Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Wang, R., Li, J., Bai, Y., Zang, T., & Wang, Y. (2018). BDBG: A bucket-based method for compressing genome sequencing data with dynamic de Bruijn graphs. PeerJ, 2018(10). https://doi.org/10.7717/peerj.5611

Readers over time

‘18‘19‘2200.751.52.253

Readers' Seniority

Tooltip

Professor / Associate Prof. 2

50%

PhD / Post grad / Masters / Doc 1

25%

Researcher 1

25%

Readers' Discipline

Tooltip

Agricultural and Biological Sciences 2

50%

Computer Science 1

25%

Chemistry 1

25%

Save time finding and organizing research with Mendeley

Sign up for free
0