Indexing factors in DNA/RNA sequences

4Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this paper, we present the Truncated Generalized Suffix Automaton (TGSA) and present an efficient on-line algorithm for its construction. TGSA is a novel type of finite automaton suitable for indexing DNA and RNA sequences, where the text is degenerate i.e. contains sets of characters. TGSA indexes the so called k-factors, the factors of the degenerate text with length not exceeding a given constant k. The presented algorithm works in O(n2) time, where n is the length of the input DNA/RNA sequence. The resulting TGSA has at most linear number of states with respect to the length of the text. TGSA enables us to find the list occ (u) of all occurrences of a given pattern u in degenerate text x in time {pipe}u{pipe} + {pipe}occ (u){pipe}. © Springer-Verlag Berlin Heidelberg 2008.

Cite

CITATION STYLE

APA

Flouri, T., Iliopoulos, C., Sohel Rahman, M., Vagner, L., & Voràček, M. (2008). Indexing factors in DNA/RNA sequences. Communications in Computer and Information Science, 13, 436–445. https://doi.org/10.1007/978-3-540-70600-7_33

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free