Bitpacking techniques for indexing genomes: II. Enhanced suffix arrays

4Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Background: Suffix arrays and their variants are used widely for representing genomes in search applications. Enhanced suffix arrays (ESAs) provide fast search speed, but require large auxiliary data structures for storing longest common prefix and child interval information. We explore techniques for compressing ESAs to accelerate genomic search and reduce memory requirements. Results: We evaluate various bitpacking techniques that store integers in fewer than 32 bits each, as well as bytecoding methods that reserve a single byte per integer whenever possible. Our results on the fly, chicken, and human genomes show that bytecoding with an exception guide array is the fastest method for retrieving auxiliary information. Genomic searching can be further accelerated using a data structure called a discriminating character array, which reduces memory accesses to the suffix array and the genome string. Finally, integrating storage of the auxiliary and discriminating character arrays further speeds up genomic search. Conclusions: The combination of exception guide arrays, a discriminating character array, and integrated data storage provide a 2- to 3-fold increase in speed for genomic searching compared with using bytecoding alone, and is 20 % faster and 40 % more space-efficient than an uncompressed ESA.

Cite

CITATION STYLE

APA

Wu, T. D. (2016). Bitpacking techniques for indexing genomes: II. Enhanced suffix arrays. Algorithms for Molecular Biology, 11(1). https://doi.org/10.1186/S13015-016-0068-6

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free