Ott: Effective tool support for the working semanticist
- ISSN: 09567968
- ISBN: 9781595938152
- DOI: 10.1017/S0956796809990293
Abstract
Semantic definitions of full-scale programming languages are rarely given, despite the many potential benefits. Partly this is because the available metalanguages for expressing semantics A usually either LTEX for informal mathematics or the formal mathematics of a proof assistant make it much harder than necessary to work with large definitions. We present a metalanguage specifically designed for this problem, and a tool, Ott, that sanity-checks such definitions and compiles them into proof assistant code for Coq, HOL, and Isabelle/HOL, together with A LTEX code for production-quality typesetting, and OCaml boilerplate. The main innovations are (1) metalanguage design to make definitions concise, and easy to read and edit; (2) an expressive but intuitive metalanguage for specifying binding structures; and (3) compilation to proof assistant code. This has been tested in substantial case studies, including modular specifications of calculi from the TAPL text, a Lightweight Java with Java JSR 277/294 module system proposals, and a large fragment of OCaml (OCamllight , 310 rules), with mechanised proofs of various soundness results. Our aim with this work is to enable a phase change: making it feasible to work routinely, without heroic effort, with rigorous semantic definitions of realistic languages.
Ott: Effective tool support for the working semanticist
Peter Sewell∗ Francesco Zappa Nardelli† Scott Owens∗ Gilles Peskine∗
Thomas Ridge∗ Susmit Sarkar∗ Rok Strnisa∗
∗University of Cambridge †INRIA
http://www.cl.cam.ac.uk/users/pes20/ott
Abstract
It is rare to give a semantic definition of a full-scale programming
language, despite the many potential benefits. Partly this is because
the available metalanguages for expressing semantics — usually
either LATEX for informal mathematics, or the formal mathematicsof a proof assistant — make it much harder than necessary to work
with large definitions.
We present a metalanguage specifically designed for this prob-
lem, and a tool, ott, that sanity-checks such definitions and com-
piles them into proof assistant code for Coq, HOL, Isabelle, and (in
progress) Twelf, together with LATEX code for production-qualitytypesetting, and OCaml boilerplate. The main innovations are: (1)
metalanguage design to make definitions concise, and easy to read
and edit; (2) an expressive but intuitive metalanguage for specify-
ing binding structures; and (3) compilation to proof assistant code.
This has been tested in substantial case studies, including mod-
ular specifications of calculi from the TAPL text, a Lightweight
Java with Java JSR 277/294 module system proposals, and a large
fragment of OCaml (around 306 rules), with machine proofs of var-
ious soundness results. Our aim with this work is to enable a phase
change: making it feasible to work routinely, without heroic effort,
with rigorous semantic definitions of realistic languages.
Categories and Subject Descriptors D.3.1 [Programming Lan-guages]: Formal Definitions and Theory
General Terms Languages, Theory, Verification, Standardization
1. Introduction
Problem Writing a precise semantic definition of a full-scale
programming language is a challenging task that has been done
only rarely, despite the many potential benefits. Indeed, Standard
ML remains, 17 years after publication, the shining example of a
language that is defined precisely and is at all widely used (Milner
et al. 1990). Even languages such as Haskell (Peyton Jones 2003)
and OCaml (Leroy et al. 2005), though designed by PL researchers
and in large part based on mathematical papers, rely on prose
descriptions.
Precise semantic definitions are rare for several reasons, but one
important reason is that the metalanguages that are available for ex-
pressing semantic definitions are not designed for this application,
making it much harder than necessary to work with large defini-
tions. There are two main choices for a metalanguage:
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. To copy otherwise, to republish, to post on servers or to redistribute
to lists, requires prior specific permission and/or a fee.
ICFP’07, October 1–3, 2007, Freiburg, Germany.
Copyright c© 2007 ACM 978-1-59593-815-2/07/0010. . . $5.00.
(1) Informal mathematics, expressed in LATEX (by far the mostcommon option).
(2) Formalised mathematics, in the language of a proof assistant
such as Coq, HOL, Isabelle, or Twelf (Coq; HOL; Isabelle;
Twelf).
For a small calculus either can be used without much difficulty.
A full language definition, however, might easily be 100 pages or
10 000 lines. At this scale the syntactic overhead of LATEX markupbecomes very significant, getting in the way of simply reading and
writing the definition source. The absence of automatic checking
of sanity properties becomes a severe problem — in our experi-
ence with the Acute language (Sewell et al. 2004), just keeping a
large definition internally syntactically consistent during develop-
ment is hard, and informal proof becomes quite unreliable. Further,
there is no support for relating the definition to an implementation,
either for generating parts of an implementation, or for testing con-
formance. Accidental errors are almost inescapable (Kahrs 1993;
Rossberg 2001).
Proof assistants help with automatic checking, but come with
their own problems. The sources of definitions are still cluttered
with syntactic noise, non-trivial encodings are often needed (e.g. to
deal with subgrammars and binding, and to work around limita-
tions of the available polymorphism and inductive definition sup-
port), and facilities for parsing and pretty printing terms of the
source language are limited. Typesetting of definitions is supported
only partially and only in some proof assistants, so one may have
the problem of maintaining machine-readable and human-readable
versions of the specification, and keeping them in sync. Moreover,
each proof assistant has its own (steep) learning curve, the commu-
nity is partitioned into schools (few people are fluent in more than
one), and one has to commit to a particular proof assistant from the
outset of a project.
A more subtle consequence of the limitations of the available
metalanguages is that they obstruct re-use of definitions across the
community, even of small calculi. Research groups each have their
own private LATEX macros and idioms — to build on a publishedcalculus, one would typically re-typeset it (possibly introducing
minor hopefully-inessential changes in the process). Proof assistant
definitions are more often made available (e.g. in the Archive of
Formal Proofs (AFP)), but are specific to a single proof assistant.
Both styles of definition make it hard to compose semantics in a
modular way, from fragments.
The Dream What, then, is the ideal? We would like to have a
metalanguage that is designed for the working semanticist, support-
ing common notations that have been developed over the years. In
an email or working note one might write grammars for languages
with complex binding structures, for example
t ::=
| let p = t in t’ bind binders(p) in t’
p ::=
| x binders = x
| { l1=p1,...,ln=pn } binders = binders(p1 ... pn)
G |- t1:T1 ... G |- tn:Tn
------------------------------------------
G |- {l1=t1,...,ln=tn} : {l1:T1,...,ln:Tn}
These are intuitively clear, concise, and easy to read and edit.
Sadly, they lack both the precision of proof assistant definitions
and the production-quality typesetting of LATEX — but one mighthope that only a modicum of information need be added to make
them precise, and to automatically compile them to both targets.
Contribution We realize this dream: we describe a metalan-
guage specifically designed for writing semantic definitions, and
a tool, ott, that sanity-checks such definitions and compiles them
into proof assistant code, LATEX code for production-quality type-setting, and OCaml boilerplate for implementations. The main in-
novations are:
• Metalanguage design to make denitions concise and easy to
read and edit (§2). The metalanguage lets one specify the syn-
tax of an object language, together with rules defining inductive
relations, for semantic judgements. Making these easy to express
demands rather different syntactic choices to those of typical pro-
gramming languages. The tool builds parsers and pretty-printers for
symbolic and concrete terms of the object language.
• An expressive metalanguage (but one that remains simple and
intuitive) for specifying binding (§3). Nontrivial object lan-
guages often involve complex forms of binding: not just the sin-
gle binders of lambda terms, which have received much attention,
but also structured patterns, multiple mutually recursive let def-
initions, or-patterns, dependent record patterns, etc. We introduce
a metalanguage that can express all these but that remains close to
informal practice. We give this two interpretations. Firstly, we de-
fine substitution and free variable functions for a “fully concrete”
representation, not quotiented by alpha equivalence. This is not ap-
propriate for all examples, but suffices for surprisingly many cases
(including those below), and is what the tool currently implements.
Secondly, we define alpha equivalence and capture-avoiding substi-
tution for arbitrary binding specifications, clarifying several issues.
Implementing this is future work, but we prove (on paper) that un-
der usable conditions the two notions of substitution coincide.
• Compilation to proof assistant code (§4). From a single defini-
tion in the metalanguage, the ott tool can generate proof assistant
definitions in Coq, HOL, Isabelle, and (in progress) Twelf. These
can then be used as a basis for formal proof and (where the proof
assistant permits) code extraction and animation.
This compilation deals with the syntactic idiosyncrasies of the
different targets and, more fundamentally, encodes features that
are not directly translatable into each target. The main issues are:
dependency analysis; support for common list idioms; generation
and use of subgrammar predicates; generation of substitution and
free variable functions; (for Isabelle) a tuple encoding for mutually
primitive recursive functions, with auxiliary function definitions
for nested pattern matching and for nested list types; (for Coq
and Twelf) generation of auxiliary list types for the syntax and
semantics; (for Coq) generation of useful induction principles when
using native lists; (for HOL) a stronger definition library, and (for
Twelf) translation of functions into relations.
We aim to generate well-formed and idiomatic definitions, with-
out dangling proof obligations, and in good taste as a basis for user
proof development.
• Substantial case studies (§5). The usefulness of the ott meta-
language and tool has been tested in several case studies. We have
defined type systems and operational semantics for:
metavar termvar, x ::= {{ com term variable }}
{{ isa string}} {{ coq nat}} {{ hol string}} {{ coq-equality }}
{{ ocaml int}} {{ lex alphanum}} {{ tex \mathit{[[termvar]]} }}
grammar
t :: ’t_’ ::= {{ com term }}
| x :: :: Var {{ com variable}}
| \ x . t :: :: Lam (+ bind x in t +) {{ com lambda }}
| t t’ :: :: App {{ com app }}
| ( t ) :: M:: Paren {{ icho [[t]] }}
| { t / x } t’ :: M:: Tsub
{{ icho (tsubst_t [[t]] [[x]] [[t’]])}}
v :: ’v_’ ::= {{ com value }}
| \ x . t :: :: Lam {{ com lambda }}
terminals :: ’terminals_’ ::=
| \ :: :: lambda {{ tex \lambda }}
| --> :: :: red {{ tex \longrightarrow }}
subrules
v <:: t
substitutions
single t x :: tsubst
defns
Jop :: ’’ ::=
defn
t1 --> t2 :: ::reduce::’’ {{ com [[t1]] reduces to [[t2]]}} by
-------------------------- :: ax_app
(\x.t12) v2 --> {v2/x}t12
t1 --> t1’
-------------- :: ctx_app_fun
t1 t --> t1’ t
t1 --> t1’
-------------- :: ctx_app_arg
v t1 --> v t1’
Figure 1. A small ott source file, for an untyped CBV lambda
calculus, with data for Coq, HOL, Isabelle, LATEX, and OCaml.
(1) small lambda calculi: simply typed (*) and with ML polymor-
phism, both call-by-value (CBV);
(2) systems from TAPL (Pierce 2002) including booleans, naturals,
functions, base types, units, seq, ascription, lets, fix, products,
sums, tuples, records, and variants; (*)
(3) the path-based module system of Leroy (1996), with a term
language and operational semantics based on Owens and Flatt
(2006);
(4) a language for rely-guarantee and separation logic Vafeiadis
and Parkinson (2007); (*)
(5) formalisation of the core ott binding specifications;
(6) Lightweight Java (LJ), a small imperative fragment of Java; (*)
(7) formalisations of Java module system proposals, based on JSR
277/294 (including LJ, around 140 semantic rules); and
(8) a large core of OCaml, including record and datatype definitions
(around 306 semantic rules). (*)
For the starred systems soundness results have been proved or
are well in progress, in one or more proof assistants. The TAPL
and LJ examples show that a very simple form of modular seman-
tics provided by ott can be effective — those TAPL features are
defined in separate files, roughly following the structure of the Tin-
kerType repository used to build the original text (Levin and Pierce
2003). Most of the examples were not done by the tool developers,
and (4–8) were primarily driven by other research goals, so we can
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime



