FQZip: Lossless Reference-Based Compression of Next Generation Sequencing Data in FASTQ Format

  • Zhang Y
  • Li L
  • Xiao J
  • et al.
N/ACitations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

High-throughput DNA sequence data generated by next generation sequencing (NGS) technologies have brought tremendous stress in data storage and transmission. Data compression serves as a candidate solution to mitigate this pressure. In this paper, a lossless referenced-based compression framework namely FQZip is proposed for NGS data in FASTQ format. Particularly, the three components namely metadata, sequence reads, and quality scores in FASTQ files are compressed independently with specific coding schemes. The sequence reads are aligned to a reference genome and then arithmetic coding, Huffman coding, and LZMA are adopted to store the indispensable alignment results. The metadata and quality scores are stored with other simple yet efficient compression mechanisms. Experimental results on real-world NGS data indicate that FQZip obtains superior compression ratio to other state-of-the-art NGS data compression methods.

Cite

CITATION STYLE

APA

Zhang, Y., Li, L., Xiao, J., Yang, Y., & Zhu, Z. (2015). FQZip: Lossless Reference-Based Compression of Next Generation Sequencing Data in FASTQ Format (pp. 127–135). https://doi.org/10.1007/978-3-319-13356-0_11

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free