In this paper, we present the Truncated Generalized Suffix Automaton (TGSA) and present an efficient on-line algorithm for its construction. TGSA is a novel type of finite automaton suitable for indexing DNA and RNA sequences, where the text is degenerate i.e. contains sets of characters. TGSA indexes the so called k-factors, the factors of the degenerate text with length not exceeding a given constant k. The presented algorithm works in O(n2) time, where n is the length of the input DNA/RNA sequence. The resulting TGSA has at most linear number of states with respect to the length of the text. TGSA enables us to find the list occ (u) of all occurrences of a given pattern u in degenerate text x in time {pipe}u{pipe} + {pipe}occ (u){pipe}. © Springer-Verlag Berlin Heidelberg 2008.
CITATION STYLE
Flouri, T., Iliopoulos, C., Sohel Rahman, M., Vagner, L., & Voràček, M. (2008). Indexing factors in DNA/RNA sequences. Communications in Computer and Information Science, 13, 436–445. https://doi.org/10.1007/978-3-540-70600-7_33
Mendeley helps you to discover research relevant for your work.