Distributed and paged suffix trees for large genetic databases

Raphaël Clifford; Marek Sergot

Journal Article

Distributed and paged suffix trees for large genetic databases

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2003) 2676 70-82

DOI: 10.1007/3-540-44888-8_6

13Citations

9Readers

Get full text

Abstract

We present two new variants of the suffix tree which allow much larger genome sequence databases to be handled efficiently. The method is based on a new linear time construction algorithm for "sparse" suffix trees, which are subtrees of the whole suffix tree. The new data structures are called the paged suffix tree (PST) and the distributed suffix tree (DST). Both tackle the memory bottleneck by constructing subtrees of the full suffix tree independently and are designed for single processor and distributed memory parallel computing environments (e.g. Beowulf clusters), respectively. The standard operations on suffix trees of biological importance are shown to be easily translatable to these new data structures. While none of these operations on the DST require interprocess communication, many have optimal expected parallel running times. © Springer-Verlag Berlin Heidelberg 2003.

Cite

CITATION STYLE

APA

Clifford, R., & Sergot, M. (2003). Distributed and paged suffix trees for large genetic databases. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2676, 70–82. https://doi.org/10.1007/3-540-44888-8_6

Distributed and paged suffix trees for large genetic databases

Abstract

Cite

Register to see more suggestions