Biological Sequences and the Exact String Matching Problem

Book Chapter

Biological Sequences and the Exact String Matching Problem

Birkhäuser-Verlag, (2006), 43-63

DOI: 10.1007/3-7643-7387-3_3

N/ACitations

25Readers

Get full text

Abstract

In computational biology one often needs to look up the occurrence of some pattern P in a text T. Since the texts of computational biology include genome sequences, which tend to be large, it is important to apply efficient methods of string matching. Traditional string matching methods are guaranteed to take time O(n), where n is the length of the text. By preprocessing a set of patterns into a keyword tree, this time requirement can be extended to set matching. Instead of preprocessing one or more patterns, it is also possible to preprocess the text. A suffix tree is a data structure that can be constructed for a given text in O(n). However, once it is constructed, it can be used to search any P in T in time O(m), where is the length of the pattern. In addition to making string searching extremely efficient, a suffix tree reveals in one fell-swoop the entire repeat structure of T without the need for carrying out any string comparisons. This has important biological applications where unique and repeat sequences play a central role in many fundamental as well as biotechnological problems. Finally, suffix trees can also be used for rapid inexact string matching, where ≤ k mismatches between P and its occurrence in T are allowed.

Cite

CITATION STYLE

APA

Biological Sequences and the Exact String Matching Problem. (2006). In Introduction to Computational Biology (pp. 43–63). Birkhäuser-Verlag. https://doi.org/10.1007/3-7643-7387-3_3

Biological Sequences and the Exact String Matching Problem

Abstract

Cite

Register to see more suggestions