Modular languages for systems and synthetic biology
Available from hdl.handle.net
Page 1
Modular languages for systems and synthetic biology
Modular Languages for Systems and
Synthetic Biology
Michael Pedersen
T H E
U N I V E R S I T YO F E D I N B U R G
H
Doctor of Philosophy
Laboratory for Foundations of Computer Science
School of Informatics
University of Edinburgh
2010
Synthetic Biology
Michael Pedersen
T H E
U N I V E R S I T YO F E D I N B U R G
H
Doctor of Philosophy
Laboratory for Foundations of Computer Science
School of Informatics
University of Edinburgh
2010
Page 3
Abstract
Systems biology is a rapidly growing field which seeks a refined quantitative under-
standing of organisms, particularly studying how molecular species such as metabo-
lites, proteins and genes interact in cells to form the complex emerging behaviour
exhibited by living systems. Synthetic biology is a related and emerging field which
seeks to engineer new organisms for practical purposes. Both fields can benefit from
formal languages for modelling, simulation and analysis.
In systems biology there is however a trade-off in the landscape of existing formal
languages: some are modular but may be difficult for some biologists to understand
(e.g. process calculi) while others are more intuitive but monolithic (e.g. rule-based
languages). The first major contribution of this thesis is to bridge this gap with a Lan-
guage for Biochemical Systems (LBS). LBS is based on the modular Calculus of Bio-
chemical Systems and adds e.g. parameterised modules with subtyping and a notion of
nondeterminism for handling combinatorial explosion. LBS can also incorporate other
rule-based languages such as Kappa, hence adding modularity to these. Modularity is
important for a rational structuring of models but can also be exploited in analysis as
is shown for the specific case of Petri net flows.
On the synthetic biology side, none of the few existing dedicated languages allow
for a high-level description of designs that can be automatically translated into DNA
sequences for implementation in living cells. The second major contribution of this
thesis is exactly such a language for Genetic Engineering of Cells (GEC). GEC exploits
the recent advent of standard genetic parts (“biobricks”) and allows for the composition
of such parts into genes in a modular and abstract manner using logical constraints.
GEC programs can then be translated to DNA sequences using a constraint satisfaction
engine based on a given database of genetic parts.
iii
Systems biology is a rapidly growing field which seeks a refined quantitative under-
standing of organisms, particularly studying how molecular species such as metabo-
lites, proteins and genes interact in cells to form the complex emerging behaviour
exhibited by living systems. Synthetic biology is a related and emerging field which
seeks to engineer new organisms for practical purposes. Both fields can benefit from
formal languages for modelling, simulation and analysis.
In systems biology there is however a trade-off in the landscape of existing formal
languages: some are modular but may be difficult for some biologists to understand
(e.g. process calculi) while others are more intuitive but monolithic (e.g. rule-based
languages). The first major contribution of this thesis is to bridge this gap with a Lan-
guage for Biochemical Systems (LBS). LBS is based on the modular Calculus of Bio-
chemical Systems and adds e.g. parameterised modules with subtyping and a notion of
nondeterminism for handling combinatorial explosion. LBS can also incorporate other
rule-based languages such as Kappa, hence adding modularity to these. Modularity is
important for a rational structuring of models but can also be exploited in analysis as
is shown for the specific case of Petri net flows.
On the synthetic biology side, none of the few existing dedicated languages allow
for a high-level description of designs that can be automatically translated into DNA
sequences for implementation in living cells. The second major contribution of this
thesis is exactly such a language for Genetic Engineering of Cells (GEC). GEC exploits
the recent advent of standard genetic parts (“biobricks”) and allows for the composition
of such parts into genes in a modular and abstract manner using logical constraints.
GEC programs can then be translated to DNA sequences using a constraint satisfaction
engine based on a given database of genetic parts.
iii
Page 4
Acknowledgements
I thank Microsoft Research for its funding through the European PhD Scholarship Pro-
gramme. I thank Gordon Plotkin for his patient supervision, and support in all things
academic, over the past three years, and for his role in co-authoring two papers on
LBS, one of which is incorporated into this thesis. I also thank Andrew Phillips for su-
pervising me during a stimulating internship experience at Microsoft Research and for
his role in co-authoring a paper on GEC which is incorporated into this thesis; many
of the diagrams in Chapter 8 are entirely due to him. I thank Vincent Danos for his en-
thusiasm and many inspiring discussions during and after Gordon’s sabbatical. I thank
Nicolas Oury for useful discussions on LBS; Monika Heiner for useful discussions on
Petri net flows; Stuart Moodie and Anatoly Sorokin for useful discussions on SBGN;
William Chen for answering questions about the ErbB pathway model; and Jane Hill-
ston and Luca Cardelli for useful feedback in their role as my thesis examiners.
I am grateful to all my friends and colleagues in the School of Informatics who
have helped make everyday life as a PhD student more fun. I am grateful to my family
for always supporting me in what I want to do, even when this puts distance between
us. Finally, I am grateful to Ros Marvin for her constant support and for continuing to
remind me of what is truly important in life.
iv
I thank Microsoft Research for its funding through the European PhD Scholarship Pro-
gramme. I thank Gordon Plotkin for his patient supervision, and support in all things
academic, over the past three years, and for his role in co-authoring two papers on
LBS, one of which is incorporated into this thesis. I also thank Andrew Phillips for su-
pervising me during a stimulating internship experience at Microsoft Research and for
his role in co-authoring a paper on GEC which is incorporated into this thesis; many
of the diagrams in Chapter 8 are entirely due to him. I thank Vincent Danos for his en-
thusiasm and many inspiring discussions during and after Gordon’s sabbatical. I thank
Nicolas Oury for useful discussions on LBS; Monika Heiner for useful discussions on
Petri net flows; Stuart Moodie and Anatoly Sorokin for useful discussions on SBGN;
William Chen for answering questions about the ErbB pathway model; and Jane Hill-
ston and Luca Cardelli for useful feedback in their role as my thesis examiners.
I am grateful to all my friends and colleagues in the School of Informatics who
have helped make everyday life as a PhD student more fun. I am grateful to my family
for always supporting me in what I want to do, even when this puts distance between
us. Finally, I am grateful to Ros Marvin for her constant support and for continuing to
remind me of what is truly important in life.
iv
Page 5
Declaration
I declare that this thesis was composed by myself, that the work contained herein is
my own except where explicitly stated otherwise in the text, and that this work has not
been submitted for any other degree or professional qualification except as specified.
(Michael Pedersen)
v
I declare that this thesis was composed by myself, that the work contained herein is
my own except where explicitly stated otherwise in the text, and that this work has not
been submitted for any other degree or professional qualification except as specified.
(Michael Pedersen)
v
Page 235
Appendix C
Proofs
C.1 Proofs for Compartment Value Lists
Proposition 5.1.1. By induction in jfvcigj. In the following we additionally use a and
b to range over compartment values.
Basis (fvcig= /0). Holds vacuously.
Step (fvcig[fv
0
cg).
Acyclic: by the induction hypothesis, Gfvcig is acyclic. Also Gfv
0
cg is acyclic,
for otherwise v0c would take the form vc
0
1 avc
0
2 avc
0
3, and it follows from well-
typedness that the compartment value a must include itself as an ancestor; this is
impossible since compartment values are finite. Suppose towards a contradiction
that there is a cycle in G(fvcig[fv
0
cg). This can then only arise from a branch in
Gfvcig of the form vc1 avc2 bvc3 and v
0
c of the form vc
0
1 bvc
0
2 avc
0
3, both of which
are well-typed. This means that the compartment value a must include b as an
ancestor, and b in turn must include a as an ancestor. Hence a must include itself
as an ancestor. But this is impossible since compartment values are finite.
Max one parent: by the induction hypothesis, each node in Gfvcig has at most
one parent. Also each node in Gfv0cg has at most one parent, for otherwise the
graph would contain a cycle. Suppose towards a contradiction that there is some
node a in G(fvcig[fv
0
cg) with two parents. This can only arise from a branch in
Gfvcig of the form vc1 bavc2 and v
0
c of the form vc
0
1 cavc
0
2 with b 6= c. But this is
impossible since both lists are well-typed and a can contain only a single parent.
223
Proofs
C.1 Proofs for Compartment Value Lists
Proposition 5.1.1. By induction in jfvcigj. In the following we additionally use a and
b to range over compartment values.
Basis (fvcig= /0). Holds vacuously.
Step (fvcig[fv
0
cg).
Acyclic: by the induction hypothesis, Gfvcig is acyclic. Also Gfv
0
cg is acyclic,
for otherwise v0c would take the form vc
0
1 avc
0
2 avc
0
3, and it follows from well-
typedness that the compartment value a must include itself as an ancestor; this is
impossible since compartment values are finite. Suppose towards a contradiction
that there is a cycle in G(fvcig[fv
0
cg). This can then only arise from a branch in
Gfvcig of the form vc1 avc2 bvc3 and v
0
c of the form vc
0
1 bvc
0
2 avc
0
3, both of which
are well-typed. This means that the compartment value a must include b as an
ancestor, and b in turn must include a as an ancestor. Hence a must include itself
as an ancestor. But this is impossible since compartment values are finite.
Max one parent: by the induction hypothesis, each node in Gfvcig has at most
one parent. Also each node in Gfv0cg has at most one parent, for otherwise the
graph would contain a cycle. Suppose towards a contradiction that there is some
node a in G(fvcig[fv
0
cg) with two parents. This can only arise from a branch in
Gfvcig of the form vc1 bavc2 and v
0
c of the form vc
0
1 cavc
0
2 with b 6= c. But this is
impossible since both lists are well-typed and a can contain only a single parent.
223
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime
Start using Mendeley in seconds!
Readership Statistics
17 Readers on Mendeley
by Discipline
6% Physics
by Academic Status
47% Ph.D. Student
18% Student (Bachelor)
12% Professor
by Country
35% United States
24% United Kingdom
12% France


