Succinct dictionary matching with no slowdown

Djamal Belazzougui

Conference Proceedings

Succinct dictionary matching with no slowdown

Belazzougui D

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2010) 6129 LNCS 88-100

DOI: 10.1007/978-3-642-13509-5_9

38Citations

13Readers

Get full text

Abstract

The problem of dictionary matching is a classical problem in string matching: given a set S of d strings of total length n characters over an (not necessarily constant) alphabet of size σ, build a data structure so that we can match in a any text T all occurrences of strings belonging to S. The classical solution for this problem is the Aho-Corasick automaton which finds all occ occurrences in a text T in time O(|T| + occ) using a representation that occupies O(mlogm) bits of space where m ≤ n + 1 is the number of states in the automaton. In this paper we show that the Aho-Corasick automaton can be represented in just m(log σ + O(1)) + O(d log(n/d)) bits of space while still maintaining the ability to answer to queries in O(|T|+occ) time. To the best of our knowledge, the currently fastest succinct data structure for the dictionary matching problem uses O(n log σ) bits of space while answering queries in O(|T| log log n + occ) time. In the paper we also show how the space occupancy can be reduced to m(H0+O(1))+O(d log(n/d)) where H 0 is the empirical entropy of the characters appearing in the trie representation of the set S, provided that σ < ε < 1. The query time remains unchanged. © Springer-Verlag Berlin Heidelberg 2010.

Cite

CITATION STYLE

APA

Belazzougui, D. (2010). Succinct dictionary matching with no slowdown. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6129 LNCS, pp. 88–100). https://doi.org/10.1007/978-3-642-13509-5_9

Succinct dictionary matching with no slowdown

Abstract

Cite

Register to see more suggestions