We have observed in recent years a growing interest in similarity search on large collections of biological sequences. Contributing to the interest, this paper presents a method for indexing the DNA sequences efficiently based on q-grams to facilitate similarity search in a DNA database and sidestep the need for linear scan of the entire database. Two level index - hash table and c-trees - are proposed based on the q-grams of DNA sequences. The proposed data structures allow the quick detection of sequences within a certain distance to the query sequence. Experimental results show that our method is efficient in detecting similarity regions in a DNA sequence database with high sensitivity. © Springer-Verlag Berlin Heidelberg 2005.
CITATION STYLE
Cao, X., Li, S. C., & Tung, A. K. H. (2005). Indexing DNA sequences using q-grams. In Lecture Notes in Computer Science (Vol. 3453, pp. 4–16). Springer Verlag. https://doi.org/10.1007/11408079_4
Mendeley helps you to discover research relevant for your work.