Succinct dictionary matching with no slowdown

38Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The problem of dictionary matching is a classical problem in string matching: given a set S of d strings of total length n characters over an (not necessarily constant) alphabet of size σ, build a data structure so that we can match in a any text T all occurrences of strings belonging to S. The classical solution for this problem is the Aho-Corasick automaton which finds all occ occurrences in a text T in time O(|T| + occ) using a representation that occupies O(mlogm) bits of space where m ≤ n + 1 is the number of states in the automaton. In this paper we show that the Aho-Corasick automaton can be represented in just m(log σ + O(1)) + O(d log(n/d)) bits of space while still maintaining the ability to answer to queries in O(|T|+occ) time. To the best of our knowledge, the currently fastest succinct data structure for the dictionary matching problem uses O(n log σ) bits of space while answering queries in O(|T| log log n + occ) time. In the paper we also show how the space occupancy can be reduced to m(H0+O(1))+O(d log(n/d)) where H 0 is the empirical entropy of the characters appearing in the trie representation of the set S, provided that σ < ε < 1. The query time remains unchanged. © Springer-Verlag Berlin Heidelberg 2010.

Cite

CITATION STYLE

APA

Belazzougui, D. (2010). Succinct dictionary matching with no slowdown. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6129 LNCS, pp. 88–100). https://doi.org/10.1007/978-3-642-13509-5_9

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free