Compression of nucleotide databases for fast searching

17Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.

Abstract

Motivation: International sequencing efforts are creating huge nucleotide databases, which are used in searching applications to locate sequences homologous to a query sequence. In such applications, it is desirable that databases are stored compactly, that sequences can be accessed independently of the order in which they were stored, and that data can be rapidly retrieved from secondary storage, since disk costs are often the bottleneck in searching. Results: We present a purpose-built direct coding scheme for fast retrieval and compression of genomic nucleotide data. The scheme is lossless, readily integrated with sequence search tools, and does not require a model. Direct coding gives good compression and allows faster retrieval than with either uncompressed data or data compressed by other methods, thus yielding significant improvements in search times for high-speed homology search tools. Availability: The direct coding scheme (cino) is available free of charge by anonymous ftp from goanna.cs.rmit.edu.au in the directoiy pub/rmit/cino. Contact: E-mail:hugh@cs.rmit.edu.au. © 1997, Oxford University Press.

Cite

CITATION STYLE

APA

Williams, H., & Zobel, J. (1997). Compression of nucleotide databases for fast searching. Bioinformatics, 13(5), 549–554. https://doi.org/10.1093/bioinformatics/13.5.549

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free