The Standard Generalized Markup Language (SGML) is an ISO standard that provides a syntactic meta-language for the definition of textual markup systems, which are used to indicate the structure of documents so that they can be electronically typeset, searched, and communicated. We address only one problem raised by the standard, namely: in SGML, the right-hand sides of context-free productions are regular expressions, called content models, that are restricted to be what the standard calls 'unambiguous,' but what is more appropriately called deterministic. We solve the problem of how to define determinism precisely, how to recognize deterministic regular expressions efficiently, and how to recognize deterministic regular languages. Any SGML parser must check that a given document grammar conforms to the standard; that is, it must validate it. Hence, our results are an important step in the clarification of the standard and in the efficient implementation of an SGML parser for SGML document grammars.
Brüggemann-Klein, A., & Wood, D. (1997). The validation of SGML content models. Mathematical and Computer Modelling, 25(4), 73–84. https://doi.org/10.1016/S0895-7177(97)00025-3