Designing external memory data structures for string databases is of significant recent interest due to the proliferation of biological sequence data. The suffix tree is an important indexing structure that provides optimal algorithms for memory bound data. However, string B-trees provide the best known asymptotic performance in external memory for substring search and update operations, Work on external memory variants of suffix trees has largely focused on constructing suffix trees in external memory or layout schemes for suffix trees that preserve link locality. In this paper, we present a new suffix tree layout scheme for secondary storage and present construction, substring search, insertion and deletion algorithms that are competitive with the string B-tree. For a set of strings of total length n, a pattern p and disk blocks of size B, we provide a substring search algorithm that uses O(|p|/B + log B n) disk accesses. We present algorithms for insertion and deletion of all suffixes of a string of length m that take O(m log B (n + m)) and O(m log B n) disk accesses, respectively. Our results demonstrate that suffix trees can be directly used as efficient secondary storage data structures for string and sequence data. © Springer-Verlag Berlin Heidelberg 2006.
CITATION STYLE
Ko, P., & Aluru, S. (2006). Obtaining provably good performance from suffix trees in secondary storage. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4009 LNCS, pp. 72–83). Springer Verlag. https://doi.org/10.1007/11780441_8
Mendeley helps you to discover research relevant for your work.