Scalable and versatile k-mer indexing for high-throughput sequencing data

12Citations
Citations of this article
15Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Philippe et al. (2011) proposed a data structure called Gk arrays for indexing and querying large collections of high-throughput sequencing data in main-memory. The data structure supports versatile queries for counting, locating, and analysing the coverage profile of k-mers in short-read data. The main drawback of the Gk arrays is its space-consumption, which can easily reach tens of gigabytes of main-memory even for moderate size inputs. We propose a compressed variant of Gk arrays that supports the same set of queries, but in both near-optimal time and space. In practice, the compressed Gk arrays scale up to much larger inputs with highly competitive query times compared to its non-compressed predecessor. The main applications include variant calling, error correction, coverage profiling, and sequence assembly. © 2013 Springer-Verlag.

Cite

CITATION STYLE

APA

Välimäki, N., & Rivals, E. (2013). Scalable and versatile k-mer indexing for high-throughput sequencing data. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7875 LNBI, pp. 237–248). https://doi.org/10.1007/978-3-642-38036-5_24

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free