Scalable and versatile k-mer indexing for high-throughput sequencing data

Niko Välimäki; Eric Rivals

Conference Proceedings

Scalable and versatile k-mer indexing for high-throughput sequencing data

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2013) 7875 LNBI 237-248

DOI: 10.1007/978-3-642-38036-5_24

12Citations

15Readers

Get full text

Abstract

Philippe et al. (2011) proposed a data structure called Gk arrays for indexing and querying large collections of high-throughput sequencing data in main-memory. The data structure supports versatile queries for counting, locating, and analysing the coverage profile of k-mers in short-read data. The main drawback of the Gk arrays is its space-consumption, which can easily reach tens of gigabytes of main-memory even for moderate size inputs. We propose a compressed variant of Gk arrays that supports the same set of queries, but in both near-optimal time and space. In practice, the compressed Gk arrays scale up to much larger inputs with highly competitive query times compared to its non-compressed predecessor. The main applications include variant calling, error correction, coverage profiling, and sequence assembly. © 2013 Springer-Verlag.

Cite

CITATION STYLE

APA

Välimäki, N., & Rivals, E. (2013). Scalable and versatile k-mer indexing for high-throughput sequencing data. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7875 LNBI, pp. 237–248). https://doi.org/10.1007/978-3-642-38036-5_24

Scalable and versatile k-mer indexing for high-throughput sequencing data

Abstract

Cite

Register to see more suggestions