Indexing DNA sequences using q-grams

27Citations
Citations of this article
19Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We have observed in recent years a growing interest in similarity search on large collections of biological sequences. Contributing to the interest, this paper presents a method for indexing the DNA sequences efficiently based on q-grams to facilitate similarity search in a DNA database and sidestep the need for linear scan of the entire database. Two level index - hash table and c-trees - are proposed based on the q-grams of DNA sequences. The proposed data structures allow the quick detection of sequences within a certain distance to the query sequence. Experimental results show that our method is efficient in detecting similarity regions in a DNA sequence database with high sensitivity. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Cao, X., Li, S. C., & Tung, A. K. H. (2005). Indexing DNA sequences using q-grams. In Lecture Notes in Computer Science (Vol. 3453, pp. 4–16). Springer Verlag. https://doi.org/10.1007/11408079_4

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free