The progress in sequencing technologies and the increasing availability of DNA sequences from extant and extinct organisms is shaping our knowledge about species origin and development, as well as originating an improvement of the computational methods for storage and analysis purposes. Given the large volume of DNA sequences, computational models that efficiently represent diverse DNA sequences using low computational resources are very welcome. Currently, for benchmarking compression algorithms there is absence of a standard corpus that enables a wide and fair comparison. This should be a corpus that reflects the main domains and kingdoms, without being exaggerated in size and number of sequences. In this paper, we provide such DNA sequence corpus, overviewing its elements and furnishing a comparison of some of the algorithms for DNA sequence compression. The corpus is available at https://tinyurl.com/DNAcorpus.
CITATION STYLE
Pratas, D., & Pinho, A. J. (2019). A DNA sequence corpus for compression benchmark. In Advances in Intelligent Systems and Computing (Vol. 803, pp. 208–215). Springer Verlag. https://doi.org/10.1007/978-3-319-98702-6_25
Mendeley helps you to discover research relevant for your work.