Reference sequence construction for relative compression of genomes

Shanika Kuruppu; Simon J. Puglisi; Justin Zobel

Conference Proceedings

Reference sequence construction for relative compression of genomes

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2011) 7024 LNCS 420-425

DOI: 10.1007/978-3-642-24583-1_41

20Citations

15Readers

Get full text

Abstract

Relative compression, where a set of similar strings are compressed with respect to a reference string, is an effective method of compressing DNA datasets containing multiple similar sequences. Moreover, it supports rapid random access to the underlying data. The main difficulty of relative compression is in selecting an appropriate reference sequence. In this paper, we explore using the dictionary of repeats generated by COMRAD, RE-PAIR and DNA-X algorithms as reference sequences for relative compression. We show that this technique allows for better compression, and allows more general repetitive datasets to be compressed using relative compression. © 2011 Springer-Verlag.

Cite

CITATION STYLE

APA

Kuruppu, S., Puglisi, S. J., & Zobel, J. (2011). Reference sequence construction for relative compression of genomes. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7024 LNCS, pp. 420–425). https://doi.org/10.1007/978-3-642-24583-1_41

Reference sequence construction for relative compression of genomes

Abstract

Cite

Register to see more suggestions