Language Convergence Infrastructure
Abstract
The process of grammar convergence involves grammar extraction and transformation for structural equivalence and contains a range of technical challenges. These need to be addressed in order for the method to deliver useful results. The paper describes a DSL and the infrastructure behind it that automates the convergence process, hides negligible back-end details, aids development/debugging and enables application of grammar convergence technology to large scale projects. The necessity of having a strong framework is explained by listing case studies. Domain elements such as extractors and transformation operators are described to illustrate the issues that were successfully addressed.
Language Convergence Infrastructure
Vadim Zaytsev
Software Languages Team, Universita¨t Koblenz-Landau, Germany
zaytsev.vadim@gmail.com
Abstract. The process of grammar convergence involves grammar ex-
traction and transformation for structural equivalence and contains a
range of technical challenges. These need to be addressed in order for
the method to deliver useful results. The paper describes a DSL and the
infrastructure behind it that automates the convergence process, hides
negligible back-end details, aids development/debugging and enables ap-
plication of grammar convergence technology to large scale projects. The
necessity of having a strong framework is explained by listing case stud-
ies. Domain elements such as extractors and transformation operators
are described to illustrate the issues that were successfully addressed.
1 Introduction
The method of grammar convergence has been presented in [15] and elaborated
in a large case study [16], with a journal version being in print. The basic idea
behind it is to extract grammars from available grammar artefacts, transform
!
"
Fig. 1. The megamodel of SLPS: every vertex is a language, every arc is a lan-
guage transformation. Thin grey lines denote tools present prior to this research: e.g.,
GDK [13] or TXL [3]. Thick grey edges are for co-authored transformations.
J.M. Fernandes et al. (Eds.): GTTSE 2009, LNCS 6491, pp. 481–497, 2011.
c
© Springer-Verlag Berlin Heidelberg 2011
them until they become identical, and draw conclusions from the properties
of the transformation chain: its length, the type of steps it consisted of, the
correspondence with the properties expected a priori from documentation, etc.
Grammar convergence can be used among other ways to establish an agreement
between a hand-crafted object model for a specific domain and an XML Schema
for standard serialisation of the same domain; to prove that various grammar-
ware such as parsers, code analysers and reverse engineering tools agree on the
language; to synchronise the language definition in the manual with the reference
implementation; to aid in disciplined grammar adaptation.
In this paper we will use the terms “grammar convergence” and “language
convergence” almost interchangeably. In fact, language convergence is a broader
term that includes convergence of not only the syntax, but also parse trees,
documentation, possibly even semantics. We focus on dealing with grammars
here, but the reader interested in consistency management for language specifi-
cations can imagine additional automated steps like extracting a grammar from
the language document before the transformation and inserting it back after-
wards [12,14].
Language convergence was developed and implemented as a part of an open
source project called SLPS, or Software Language Processing Suite1. It comprises
several stand-alone scripts targeting comparison, transformation, benchmark-
ing, validation, extraction, pretty-printing. Most of those scripts were written in
Python, Prolog, Shell and XSLT. Grammar convergence is a complicated process
that can only be automated partially and therefore requires expert knowledge
to be used successfully. In order to simplify the work of a grammar engineer, a
specific technical infrastructure is needed with a solid transformation operators
suite, steadily defined internal notations and a powerful tool support for every
stage. This paper presents such a framework and explains both engineering and
scientific design choices behind it.
Figure 1 presents a “megamodel” [2] of SLPS. Every arc from this graph
is a language transformation tool or a sequence of pipelined tools. Many of
the new DSLs developed for this infrastructure are in fact XML: BGF, XBGF,
BTF, XBTF, LDF, XLDF, LCF—just an engineering decision that let them
profit fully from XMLware facilities like validation against schemata and trans-
formation with pattern matching. (These advantages are not unique for XML,
of course). Others are mostly well-known languages that existed prior to this
research: ANTLR [18], SDF [9], LLL [13], XSD [5], etc.
The left hand side of the megamodel is mostly dedicated to language
documentation-related components: LDF is a Language Document Format [23],
an extension of grammar notation that covers most commonly encountered el-
ements of language manuals and specifications. The central part contains the
grammar notation itself: the BGF node has a big fan-in since every incoming
arc represents a grammar extraction tool (see §4.1). The only outgoing arcs are
the main presentation forms: pure text, marked up LATEX and a graph form, plus
transformation generators (see §5.4) and integration tools (see §6.3).
1 Software Language Processing Suite: http://slps.sf.net
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime


