Bit-coded regular expression parsing

20Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Regular expression parsing is the problem of producing a parse tree of a string for a given regular expression. We show that a compact bit representation of a parse tree can be produced efficiently, in time linear in the product of input string size and regular expression size, by simplifying the DFA-based parsing algorithm due to Dubé and Feeley to emit the bits of the bit representation without explicitly materializing the parse tree itself. We furthermore show that Frisch and Cardelli's greedy regular expression parsing algorithm can be straightforwardly modified to produce bit codings directly. We implement both solutions as well as a backtracking parser and perform benchmark experiments to gauge their practical performance. We observe that our DFA-based solution can be significantly more time and space efficient than the Frisch-Cardelli algorithm due to its sharing of DFA-nodes, but that the latter may still perform better on regular expressions that are "more deterministic" from the right than the left. (Backtracking is, unsurprisingly, quite hopeless). © 2011 Springer-Verlag.

Cite

CITATION STYLE

APA

Nielsen, L., & Henglein, F. (2011). Bit-coded regular expression parsing. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6638 LNCS, pp. 402–413). Springer Verlag. https://doi.org/10.1007/978-3-642-21254-3_32

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free