Towards computational improvement of DNA database indexing and short DNA query searching

Done Stojanov; Sašo Koceski; Aleksandra Mileva; Nataša Koceska; Cveta Martinovska Bande

Journal ArticleOPEN ACCESS

Towards computational improvement of DNA database indexing and short DNA query searching

Biotechnology and Biotechnological Equipment (2014) 28(5) 958-967

DOI: 10.1080/13102818.2014.959711

1Citations

5Readers

Abstract

In order to facilitate and speed up the search of massive DNA databases, the database is indexed at the beginning, employing a mapping function. By searching through the indexed data structure, exact query hits can be identified. If the database is searched against an annotated DNA query, such as a known promoter consensus sequence, then the starting locations and the number of potential genes can be determined. This is particularly relevant if unannotated DNA sequences have to be functionally annotated. However, indexing a massive DNA database and searching an indexed data structure with millions of entries is a time-demanding process. In this paper, we propose a fast DNA database indexing and searching approach, identifying all query hits in the database, without having to examine all entries in the indexed data structure, limiting the maximum length of a query that can be searched against the database. By applying the proposed indexing equation, the whole human genome could be indexed in 10 hours on a personal computer, under the assumption that there is enough RAM to store the indexed data structure. Analysing the methodology proposed by Reneker, we observed that hits at starting positions p ≤ k– |q| are not reported, if the database is searched against a query shorter thanknucleotides, such thatkis the length of the DNA database words being mapped and |q| is the length of the query. A solution of this drawback is also presented.

Author supplied keywords

Cite

CITATION STYLE

APA

Stojanov, D., Koceski, S., Mileva, A., Koceska, N., & Bande, C. M. (2014). Towards computational improvement of DNA database indexing and short DNA query searching. Biotechnology and Biotechnological Equipment, 28(5), 958–967. https://doi.org/10.1080/13102818.2014.959711

Towards computational improvement of DNA database indexing and short DNA query searching

Abstract

Author supplied keywords

Cite

Register to see more suggestions