Unambiguity of extended regular expressions in SGML document grammars

18Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In the Standard Generalized Markup Language (SGML), document types are defined by context-free graznmars in an extended Backus-Naur form. The right-hand side of a production is called a content model. Content models are extended regular expressions that have to be unambiguous in the sense that “an element… that occurs in the document instance must be able to satisfy only one primitive content token without looking ahead in the document instance.” In this paper, we present a linear-time algorithm that decides whether a given content model is unambiguous. A similar result has previously been obtained not for content models but for the smaller class of standard regular expressions. It reties on the fact that the languages of marked regular expressions are local—a property that does not hold any more for content models that contain the new &-operator. Therefore, it is necessary to develop new techniques for content models. Besides solving an interesting problem in formal language theory, our results are relevant for developers of SGML systems. In fact, our definitions are causing changes to the revised edition of the SGML standard, and the algorithm to test content models for unambiguity has been implemented in an SGML parser.

Cite

CITATION STYLE

APA

Brüggemann-Klein, A. (1993). Unambiguity of extended regular expressions in SGML document grammars. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 726 LNCS, pp. 73–84). Springer Verlag. https://doi.org/10.1007/3-540-57273-2_45

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free