We show how to turn a regular expression R of length r into an O(s) space representation of McNaughton and Yamada’s NFA, where s is the number of occurrences of alphabet symbols in R, and s+1 is the number of NFA states. The standard adjacency list representation of McNaughton and Yamada’s NFA takes up s + s2space in the worst case. The adjacency list representation of the NFA produced by Thompson takes up between 2r and 6r space, where r can be arbitrarily larger than s. Given any set V of NFA states, our representation can be used to compute the set U of states one transition away from the states in V in optimal time O(|V|+|U|). McNaughton and Yamada’s NFA requires Θ(|V| × |U|) time in the worst case. Using Thompson’s NFA, the equivalent calculation requires Θ(r) time in the worst case. An implementation of our NFA representation confirms that it takes up an order of magnitude less space than McNaughton and Yamada’s machine. An implementation to produce a DFA from our NFA representation by subset construction shows linear and quadratic speedups over subset construction starting from both Thompson’s and McNaughton and Yamada’s NFA’s. It also shows that the DFA produced from our NFA is as much as one order of magnitude smaller than DFA’s constructed from the two other NFA’s. Throughout this paper the importance of syntax is stressed in the design of our algorithms. In particular, we exploit a method of program improvement in which costly repeated calculations can be avoided by establishing and maintaining program invariants. This method of symbolic finite differencing has been used previously by Douglas Smith to derive efficient functional programs.
Chang, C. H., & Paige, R. (1992). From regular expressions to DFA’s using compressed NFA’s. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 644 LNCS, pp. 90–110). Springer Verlag. https://doi.org/10.1016/s0304-3975(96)00140-5