A DNA sequence corpus for compression benchmark

9Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The progress in sequencing technologies and the increasing availability of DNA sequences from extant and extinct organisms is shaping our knowledge about species origin and development, as well as originating an improvement of the computational methods for storage and analysis purposes. Given the large volume of DNA sequences, computational models that efficiently represent diverse DNA sequences using low computational resources are very welcome. Currently, for benchmarking compression algorithms there is absence of a standard corpus that enables a wide and fair comparison. This should be a corpus that reflects the main domains and kingdoms, without being exaggerated in size and number of sequences. In this paper, we provide such DNA sequence corpus, overviewing its elements and furnishing a comparison of some of the algorithms for DNA sequence compression. The corpus is available at https://tinyurl.com/DNAcorpus.

Cite

CITATION STYLE

APA

Pratas, D., & Pinho, A. J. (2019). A DNA sequence corpus for compression benchmark. In Advances in Intelligent Systems and Computing (Vol. 803, pp. 208–215). Springer Verlag. https://doi.org/10.1007/978-3-319-98702-6_25

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free