In this study, we present an online suffix tree construction approach where multiple sequences are indexed by a single suffix tree. Due to the poor memory locality and high space consumption, online suffix tree construction on disk is a striving process. Even more, performance of the construction suffers when alphabet size is large. In order to overcome these difficulties, first, we present a space efficient node representation approach to be used in Ukkonen suffix tree construction algorithm. Next, we show that performance can be increased through incorporating semantic knowledge such as utilizing the frequently used letters of an alphabet. In particular, we estimate the frequently accessed nodes of the tree and introduce a sequence insertion strategy into the tree. As a result, we can speed up accessing to the frequently accessed nodes. Finally, we analyze the contribution of buffering strategies and page sizes on performance and perform detailed tests. We run a series of experimentation under various buffering strategies and page sizes. Experimental results showed that our approach outperforms existing ones. © 2008 Springer-Verlag.
CITATION STYLE
Ozcan, G., & Alpkocak, A. (2008). Online suffix tree construction for streaming sequences. In Communications in Computer and Information Science (Vol. 6 CCIS, pp. 69–81). https://doi.org/10.1007/978-3-540-89985-3_9
Mendeley helps you to discover research relevant for your work.