Bayesian design of synthetic biological systems
Abstract
Here we introduce a new design framework for synthetic biology that exploits the advantages of Bayesian model selection. We will argue that the difference between inference and design is that in the former we try to reconstruct the system that has given rise to the data that we observe, whereas in the latter, we seek to construct the system that produces the data that we would like to observe, i.e., the desired behavior. Our approach allows us to exploit methods from Bayesian statistics, including efficient exploration of models spaces and high-dimensional parameter spaces, and the ability to rank models with respect to their ability to generate certain types of data. Bayesian model selection furthermore automatically strikes a balance between complexity and (predictive or explanatory) performance of mathematical models. To deal with the complexities of molecular systems we employ an approximate Bayesian computation scheme which only requires us to simulate from different competing models to arrive at rational criteria for choosing between them. We illustrate the advantages resulting from combining the design and modeling (or in silico prototyping) stages currently seen as separate in synthetic biology by reference to deterministic and stochastic model systems exhibiting adaptive and switch-like behavior, as well as bacterial two-component signaling systems.
Bayesian design of synthetic biological systems
Chris P. Barnesa,b,1, Daniel Silka,b, Xia Shenga,b, and Michael P. H. Stumpfa,b,c,d,1
aCenter for Bioinformatics, Division of Molecular Biosciences; bInstitute of Mathematical Sciences; cCenter for Integrative Systems Biology; and
dInstitute of Chemical Biology, Imperial College London, London SW7 2AZ, United Kingdom
Edited by Peter J. Bickel, University of California, Berkeley, CA, and approved July 27, 2011 (received for review December 1, 2010)
Here we introduce a new design framework for synthetic biology
that exploits the advantages of Bayesian model selection. We will
argue that the difference between inference and design is that
in the former we try to reconstruct the system that has given rise
to the data that we observe, whereas in the latter, we seek to
construct the system that produces the data that we would like to
observe, i.e., the desired behavior. Our approach allows us to ex-
ploit methods from Bayesian statistics, including efficient explora-
tion of models spaces and high-dimensional parameter spaces, and
the ability to rank models with respect to their ability to generate
certain types of data. Bayesian model selection furthermore auto-
matically strikes a balance between complexity and (predictive or
explanatory) performance of mathematical models. To deal with
the complexities of molecular systems we employ an approximate
Bayesian computation scheme which only requires us to simulate
from different competing models to arrive at rational criteria for
choosing between them. We illustrate the advantages resulting
from combining the design and modeling (or in silico prototyping)
stages currently seen as separate in synthetic biology by reference
to deterministic and stochastic model systems exhibiting adaptive
and switch-like behavior, as well as bacterial two-component
signaling systems.
biochemical circuits ∣ dynamical systems ∣ robustness
As we are beginning to understand the mechanisms governingbiological systems we are starting to identify potential ways
of guiding or controlling the behavior of cellular and molecular
systems. Rationally reengineering organisms for biomedical or
biotechnological purposes has become the central aim of the
fledgling discipline of synthetic biology. By redirecting regulatory
and physical interactions or by altering molecular binding affi-
nities we may, for example, control metabolic processes (1, 2) or
alter intra- and intercellular communication and decision making
processes (3, 4). The range of potential applications of such
engineered systems is vast: designing microbes for biofuel pro-
duction (5, 6) and bioremediation (7); developing control strate-
gies which drive stem cells through the various decisions to
become terminally differentiated (or back) (8, 9), with the aim of
developing novel therapeutics (10, 11); construction of new drug-
delivery systems with homing microbes delivering molecular
medicines directly to the site where they are needed (12); use
of bacteria or bacterial populations (employing swarming and
quorum sensing) as biosensors (13); and gaining better under-
standing of all manner of biological systems by systematically
probing their underlying molecular machinery.
A range of tools and building blocks for such engineered
biological systems are now available which allow us to, at least in
principle, build such systems from simple and reusable biological
components (14). In electronic systems, such modularity has been
crucial and has allowed the cost-effective production of reliable
components that can be combined to produce desired outputs.
Biology, however, poses different and novel challenges that are
intimately linked to the biophysical and biochemical properties
of biomolecules and the media in which they are suspended.
Especially in crowded environments such as found inside living
cells the lack of insulation between different components, i.e., the
very real possibility of undesired cross talk, can create problems;
with increasing miniaturization similar, albeit quantum effects,
are now also surfacing in electronic circuits (15).
As synthetic biology gears up to bring engineering methods
and tools to bear on biological problems the way in which we
manipulate biological systems and processes is likely to change.
Historically, each new branch of engineering has gone through a
phase of what can be described as tinkering before rationally
planned and executed designs became common place. Arguably,
this practice is the current state of synthetic biology and it has
indeed been suggested that the complexity of synthetic biological
systems over the past decade has reached a plateau (16). From
the earliest days, explicit quantitative modeling of systems has
been integral to the vision and practice of synthetic biology
and it will become increasingly important in the future. The abil-
ity to model how a natural or synthetic system will perform under
controlled conditions must in fact be seen as one of the hallmarks
of success of the integrative approaches taken by systems and
synthetic biology.
Here we present a statistical approach to the design of syn-
thetic biological systems that utilizes methods from Bayesian
statistics to train models according to specified input-output char-
acteristics. It incorporates modeling and automated design and is
general in the sense that it can be applied to any system that can
be described by a mathematical model which can be simulated.
Because of the statistical nature of this approach, previously chal-
lenging problems such as handling stochastic models, accounting
for kinetic parameter uncertainty, and incorporating environ-
mental stochasticity can all be handled in a straightforward and
consistent manner.
Bayesian Approach to System Design
The question of how to design a system to perform a specified
task can be viewed as an analogue to reverse engineering. In de-
sign we want to elucidate the most appropriate system to achieve
our design objectives; in reverse engineering we aim to infer the
most probable system structure and dynamics that can give rise to
some observational data. In this respect, the design question can
be viewed as statistical inference on data we wish to observe.
In the Bayesian approach to statistical inference the posterior
distribution is the quantity of interest and is given by the
normalized product of the likelihood and the prior. In most prac-
tical applications the posterior distribution cannot be derived
analytically, but if the likelihood (and prior) can be expressed
mathematically we can use Monte Carlo methods to sample from
the posterior. In many cases where the model structure is complex
the likelihood cannot be written in closed form and traditional
Monte Carlo techniques cannot be applied. These include infer-
ence for the types of stochastic processes encountered in systems
and synthetic biology. In these cases a family of techniques known
Author contributions: C.P.B. and M.P.H.S. designed research; C.P.B., D.S., and X.S.
performed research; and C.P.B. and M.P.H.S. wrote the paper.
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
1To whom correspondence may be addressed. E-mail: christopher.barnes@imperial.ac.uk
or m.stumpf@imperial.ac.uk.
This article contains supporting information online at www.pnas.org/lookup/suppl/
doi:10.1073/pnas.1017972108/-/DCSupplemental.
15190–15195 ∣ PNAS ∣ September 13, 2011 ∣ vol. 108 ∣ no. 37 www.pnas.org/cgi/doi/10.1073/pnas.1017972108
applied: ABC uses model simulations to approximate the poster-
ior distribution directly. Here we use a sequential Monte Carlo
ABC algorithm known as ABC SMC to move from the prior to
the approximate posterior via a series of intermediate distribu-
tions (17). This framework can also be used to perform Bayesian
model selection (18) and has been implemented in the software
package ABC-SysBio (19).
Fig. 1 depicts the approach presented here. The design objec-
tives are first specified through input-output characteristics.
Here these have been depicted as a single time series, though
the method can be applied in a much broader sense with multiple
inputs and outputs. A set of competing designs is then specified
through deterministic or stochastic models, each containing a
set of kinetic parameters and associated prior distributions. The
distance function measures the discrepancy between the model
output and the objective. In principal it is possible to specify a
distribution over the objective and each model could also contain
experimental error. The ABC SMC algorithm then automatically
evolves the set of models toward the desired design objectives.
The results are a set of posterior probabilities representing
the probability for each design to achieve the specified design
objectives in addition to the posterior probability distribution
of the associated kinetic parameters. This approach is similar
in spirit to some existing methods for the automated design of
genetic networks such as those adopting evolutionary algorithms
(20, 21), Monte Carlo methods (22, 23), or optimization (24–26)
but the advantages of our method over traditional ones are that
we can utilize powerful concepts from Bayesian statistics in the
design of complex biological systems, including
the rational comparison of models under parameter uncer-
tainty using Bayesian model selection, which automatically
accounts for model complexity (number of parameters) and
robustness to parameter uncertainty;
a posterior distribution over possible design parameter values
that can be analyzed for parameter sensitivity and robustness
and provide credible limits on design parameters;
the treatment of stochastic systems at the design stage includ-
ing the design of systems with required probability distributions
on system components; and
methods for the efficient exploration of high-dimensional
parameter space.
In the following we demonstrate the power of this approach by
examining, from this unique perspective, systems that have been
of interest in the recent literature. First we consider systems that
are capable of biochemical adaptation (27), we then look at the
ability of two bacterial two-component system (TCS) topologies
to achieve particular input-output behaviors, and finally we finish
with an analysis of designs for a stochastic toggle switch with no
cooperative binding at the promoter.
Biochemical Adaptation
Biochemical adaptation refers to the ability of a system to
respond to an input signal and return to the prestimulus steady
state (Fig. 1A). Ma et al. (27) identified two three-node network
topologies that are necessary for biochemical adaptation: a nega-
tive feedback loop with a buffering node and an incoherent
feedforward loop with a proportioner node (IFFLP). Within
these categories they identified eleven simple networks that were
capable of adaptation (Fig. 2A). We applied the Bayesian design
approach to these eleven networks using Michaelis–Menten
kinetic models with and without cooperativity [SI Appendix shows
ordinary differential equations (ODEs) describing these models].
The desired output characteristics were defined through the
adaptation efficiency, E, and sensitivity, S, given by
E ¼
ðO
2
− O
1
Þ∕O
1
ðI
2
− I
1
Þ∕I
1
and S ¼
ðOpeak − O1Þ∕O1
ðI
2
− I
1
Þ∕I
1
;
where I
1
;I
2
are the input values (here fixed at 0.5 and 0.6, respec-
tively), O
1
;O
2
are the output steady-state levels before and after
the input change and Opeak is the maximal transient output level.
We defined the two-component distance to be ρðx;OÞ ¼ fE;S−1g
such that as ρðx;OÞ decreases the behavior approaches the desired
behavior. The final population was defined to obey the tolerances
ϵ ¼ f0.1;1.0g, which defines close to perfect adaptation (when
O
1
− O
2
≤ O
1
∕50) and a fractional response equal to the frac-
tional change in input.
The results of the model selection are shown in Fig. 2 B and C.
When cooperativity is not included the most robust designs for
producing the desired input-output characteristics are the inco-
herent feedforward loops, but when cooperativity is added the
posterior shifts significantly toward the negative feedback topol-
ogy. If a system with these requirements were to be implemented
then not only would designs 11 and 4 be clear candidates for
further study, but many of the designs can be effectively ruled
out and the ranking of the models provides a clear strategy for
an experimental program. These results also illustrate how small
changes in context or incomplete understanding of a system can
produce a large change in the most robust design. The Bayesian
framework allows us to incorporate such uncertainty—or safe-
guard against our ignorance—naturally into the design process.
The posterior distribution provides information on which
parameters are correlated and which are the most sensitive to the
desired behavior. The posterior for model 11 under no coopera-
tivity is shown in Fig. 2D, where the ODE model is given by
A
C D
B
Fig. 1. Bayesian approach to system design. (A) The design objectives are
encoded by the specification of input and output characteristics. (B) One
or more competing designs for the system are specified together with priors
on the parameters. A distance function, ρðx;OÞ, relates model output, x, to
the desired output characteristic, O. (C) The system is evolved using sequen-
tial Monte Carlo. Each population more accurately approximates the desired
behavior. (D) The model posterior probability encodes the ability of each
design to achieve the desired behavior. The parameter posterior shows para-
meters that are sensitive or insensitive to the input-output specification.
Barnes et al. PNAS ∣ September 13, 2011 ∣ vol. 108 ∣ no. 37 ∣ 15191
BI
OP
HY
SI
CS
AN
D
CO
M
PU
TA
TI
ON
AL
BI
OL
OG
Y
ST
AT
IS
TI
CS
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime




