Rapid, separable compression enables fast analyses of sequence alignments

2Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.

Abstract

Continued growth of generated sequencing data demands novel scalable approaches to its storage and transmission. It is also crucial that analyses can be run on data in its compressed form without having to fully reconstruct it. We propose a novel approach to compression of sequence alignment data, a well established data format that is used for a variety of tasks ranging from genome assembly to variant calling. Such alignment files may exceed the size of the original sequence by an order of magnitude, however, Referee, our tool implementing the approach, is able to compress alignment files to 1/10 of the original SAM file size and is twice as efficient as SAM's binary BAM variant. Referee is fast, highly parallelizable, and outperforms state of the art tools by an average of 8.1% while enabling a variety of sequence-related tasks that require only a partial decompression. Computations like depth of sequencing that involve seeking through all alignments take from 8 to 44 seconds for Referee as opposed to tens of minutes with samtools. Referee uses a lightweight streaming clustering algorithm to improve quality values compression and encodes sequence information very efficiently, with compression rates as low as 0.06 bits per base. Its modular structure allows one to omit extraneous alignment information from the download reducing sequencing data from many gigabytes to under a hundred megabytes.

Author supplied keywords

Cite

CITATION STYLE

APA

Filippova, D., & Kingsford, C. (2015). Rapid, separable compression enables fast analyses of sequence alignments. In BCB 2015 - 6th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (pp. 194–201). Association for Computing Machinery, Inc. https://doi.org/10.1145/2808719.2808739

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free