ENANO: Encoder for NANOpore FASTQ files

13Citations
Citations of this article
18Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Motivation: The amount of genomic data generated globally is seeing explosive growth, leading to increasing needs for processing, storage and transmission resources, which motivates the development of efficient compression tools for these data. Work so far has focused mainly on the compression of data generated by short-read technologies. However, nanopore sequencing technologies are rapidly gaining popularity due to the advantages offered by the large increase in the average size of the produced reads, the reduction in their cost and the portability of the sequencing technology. We present ENANO (Encoder for NANOpore), a novel lossless compression algorithm especially designed for nanopore sequencing FASTQ files. Results: The main focus of ENANO is on the compression of the quality scores, as they dominate the size of the compressed file. ENANO offers two modes, Maximum Compression and Fast (default), which trade-off compression efficiency and speed. We tested ENANO, the current state-of-the-art compressor SPRING and the general compressor pigz on several publicly available nanopore datasets. The results show that the proposed algorithm consistently achieves the best compression performance (in both modes) on every considered nanopore dataset, with an average improvement over pigz and SPRING of >24.7% and 6.3%, respectively. In addition, in terms of encoding and decoding speeds, ENANO is 2.9× and 1.7× times faster than SPRING, respectively, with memory consumption up to 0.2 GB. Availability and implementation: ENANO is freely available for download at: https://github.com/guilledufort/ EnanoFASTQ. Supplementary information: Supplementary data are available at Bioinformatics online.

Cite

CITATION STYLE

APA

Álvarez, G. D., Seroussi, G., Smircich, P., Sotelo, J., Ochoa, I., & Martín, Á. (2020). ENANO: Encoder for NANOpore FASTQ files. Bioinformatics, 36(16), 4506–4507. https://doi.org/10.1093/bioinformatics/btaa551

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free