Run-length compressed indexes are superior for highly repetitive sequence collections

62Citations
Citations of this article
17Readers
Mendeley users who have this article in their library.
Get full text

Abstract

A repetitive sequence collection is one where portions of a base sequence of length n are repeated many times with small variations, forming a collection of total length N. Examples of such collections are version control data and genome sequences of individuals, where the differences can be expressed by lists of basic edit operations. This paper is devoted to studying ways to store massive sets of highly repetitive sequence collections in space-efficient manner so that retrieval of the content as well as queries on the content of the sequences can be provided time-efficiently. We show that the state-of-the-art entropy-bound full-text self-indexes do not yet provide satisfactory space bounds for this specific task. We engineer some new structures that use run-length encoding and give empirical evidence that these structures are superior to the current structures. © 2009 Springer Berlin Heidelberg.

Cite

CITATION STYLE

APA

Sirén, J., Välimäki, N., Mäkinen, V., & Navarro, G. (2008). Run-length compressed indexes are superior for highly repetitive sequence collections. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5280 LNCS, pp. 164–175). Springer Verlag. https://doi.org/10.1007/978-3-540-89097-3_17

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free