LFQC: A lossless compression algorithm for FASTQ files

Sudipta Pathak; Sanguthevar Rajasekaran

ArticleOPEN ACCESS

LFQC: A lossless compression algorithm for FASTQ files

Bioinformatics

DOI: 10.1093/bioinformatics/btu701

0Citations

19Readers

Abstract

Motivation: Next Generation Sequencing (NGS) technologies have revolutionized genomic research by reducing the cost of whole genome sequencing. One of the biggest challenges posed by modern sequencing technology is economic storage of NGS data. Storing raw data is infeasible because of its enormous size and high redundancy. In this paper we address the problem of storage and transmission of large fastq files using innovative compression techniques. Results: We introduce a new lossless non-reference based fastq compression algorithm named LFQC. We have compared our algorithm with other state of the art big data compression algorithms namely gzip, bzip2, fastqz (Bonfield and Mahoney, 2013), fqzcomp (Bonfield and Mahoney, 2013), G-SQZ (Tembe, et al., 2010), SCALCE (Hach, et al., 2012), Quip (Jones, et al., 2012), DSRC (Deorowicz, et al., 2011), DSRC-LZ (Deorowicz, et al., 2011), etc. This comparison reveals that our algorithm achieves better compression ratios. The improvement obtained is up to 225%. For example, on one of the data sets (SRR065390 1), the average improvement (over all the algorithms compared) is 74.62%.

Cite

CITATION STYLE

APA

Pathak, S., & Rajasekaran, S. (2019, May 1). LFQC: A lossless compression algorithm for FASTQ files. Bioinformatics. Oxford University Press. https://doi.org/10.1093/bioinformatics/btu701

LFQC: A lossless compression algorithm for FASTQ files

Abstract

Cite

Register to see more suggestions