Succinct BWT-Based Sequence Prediction

Rafael Ktistakis; Philippe Fournier-Viger; Simon J. Puglisi; Rajeev Raman

Conference Proceedings

Succinct BWT-Based Sequence Prediction

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2019) 11707 LNCS 91-101

DOI: 10.1007/978-3-030-27618-8_7

6Citations

8Readers

Get full text

Abstract

Sequences of symbols can be used to represent data in many domains such as text documents, activity logs, customer transactions and website click-streams. Sequence prediction is a popular task, which consists of predicting the next symbol of a sequence, given a set of training sequences. Although numerous prediction models have been proposed, many have a low accuracy because they are lossy models (they discard information from training sequences to build the model), while lossless models are often more accurate but typically consume a large amount of memory. This paper addresses these issues by proposing a novel sequence prediction model named SuBSeq that is lossless and utilizes the succinct Wavelet Tree data structure and the Burrows-Wheeler Transform to compactly store and efficiently access training sequences for prediction. An experimental evaluation shows that SuBSeq has a very low memory consumption and excellent accuracy when compared to eight state-of-the-art predictors on seven real datasets.

Cite

CITATION STYLE

APA

Ktistakis, R., Fournier-Viger, P., Puglisi, S. J., & Raman, R. (2019). Succinct BWT-Based Sequence Prediction. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11707 LNCS, pp. 91–101). Springer. https://doi.org/10.1007/978-3-030-27618-8_7

Succinct BWT-Based Sequence Prediction

Abstract

Cite

Register to see more suggestions