Indexing DNA sequences using q-grams

Xia Cao; Shuai Cheng Li; Anthony K.H. Tung

Conference Proceedings

Indexing DNA sequences using q-grams

Lecture Notes in Computer Science (2005) 3453 4-16

DOI: 10.1007/11408079_4

27Citations

19Readers

Get full text

Abstract

We have observed in recent years a growing interest in similarity search on large collections of biological sequences. Contributing to the interest, this paper presents a method for indexing the DNA sequences efficiently based on q-grams to facilitate similarity search in a DNA database and sidestep the need for linear scan of the entire database. Two level index - hash table and c-trees - are proposed based on the q-grams of DNA sequences. The proposed data structures allow the quick detection of sequences within a certain distance to the query sequence. Experimental results show that our method is efficient in detecting similarity regions in a DNA sequence database with high sensitivity. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Cao, X., Li, S. C., & Tung, A. K. H. (2005). Indexing DNA sequences using q-grams. In Lecture Notes in Computer Science (Vol. 3453, pp. 4–16). Springer Verlag. https://doi.org/10.1007/11408079_4

Indexing DNA sequences using q-grams

Abstract

Cite

Register to see more suggestions