A DNA sequence corpus for compression benchmark

Diogo Pratas; Armando J. Pinho

Conference Proceedings

A DNA sequence corpus for compression benchmark

Advances in Intelligent Systems and Computing (2019) 803 208-215

DOI: 10.1007/978-3-319-98702-6_25

9Citations

3Readers

Get full text

Abstract

The progress in sequencing technologies and the increasing availability of DNA sequences from extant and extinct organisms is shaping our knowledge about species origin and development, as well as originating an improvement of the computational methods for storage and analysis purposes. Given the large volume of DNA sequences, computational models that efficiently represent diverse DNA sequences using low computational resources are very welcome. Currently, for benchmarking compression algorithms there is absence of a standard corpus that enables a wide and fair comparison. This should be a corpus that reflects the main domains and kingdoms, without being exaggerated in size and number of sequences. In this paper, we provide such DNA sequence corpus, overviewing its elements and furnishing a comparison of some of the algorithms for DNA sequence compression. The corpus is available at https://tinyurl.com/DNAcorpus.

Author supplied keywords

Cite

CITATION STYLE

APA

Pratas, D., & Pinho, A. J. (2019). A DNA sequence corpus for compression benchmark. In Advances in Intelligent Systems and Computing (Vol. 803, pp. 208–215). Springer Verlag. https://doi.org/10.1007/978-3-319-98702-6_25

A DNA sequence corpus for compression benchmark

Abstract

Author supplied keywords

Cite

Register to see more suggestions