Reference sequence construction for relative compression of genomes

20Citations
Citations of this article
15Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Relative compression, where a set of similar strings are compressed with respect to a reference string, is an effective method of compressing DNA datasets containing multiple similar sequences. Moreover, it supports rapid random access to the underlying data. The main difficulty of relative compression is in selecting an appropriate reference sequence. In this paper, we explore using the dictionary of repeats generated by COMRAD, RE-PAIR and DNA-X algorithms as reference sequences for relative compression. We show that this technique allows for better compression, and allows more general repetitive datasets to be compressed using relative compression. © 2011 Springer-Verlag.

Cite

CITATION STYLE

APA

Kuruppu, S., Puglisi, S. J., & Zobel, J. (2011). Reference sequence construction for relative compression of genomes. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7024 LNCS, pp. 420–425). https://doi.org/10.1007/978-3-642-24583-1_41

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free