Faster general parsing through context-free memoization

Grzegorz Herman

Conference ProceedingsOPEN ACCESS

Faster general parsing through context-free memoization

Herman G

Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI) (2020) 1022-1035

DOI: 10.1145/3385412.3386032

5Citations

10Readers

Get full text

Abstract

We present a novel parsing algorithm for all context-free languages. The algorithm features a clean mathematical formulation: parsing is expressed as a series of standard operations on regular languages and relations. Parsing complexity w.r.t. input length matches the state of the art: it is worst-case cubic, quadratic for unambiguous grammars, and linear for LR-regular grammars. What distinguishes our approach is that parsing can be implemented using only immutable, acyclic data structures. We also propose a parsing optimization technique called context-free memoization. It allows handling an overwhelming majority of input symbols using a simple stack and a lookup table, similarly to the operation of a deterministic LR(1) parser. This allows our proof-of-concept implementation to outperform the best current implementations of common generalized parsing algorithms (Earley, GLR, and GLL). Tested on a large Java source corpus, parsing is 3-5 times faster, while recognition - 35 times faster.

Author supplied keywords

Cite

CITATION STYLE

APA

Herman, G. (2020). Faster general parsing through context-free memoization. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI) (pp. 1022–1035). Association for Computing Machinery. https://doi.org/10.1145/3385412.3386032

Faster general parsing through context-free memoization

Abstract

Author supplied keywords

Cite

Register to see more suggestions