Sign up & Download
Sign in

CaliBayes and BASIS: integrated tools for the calibration, simulation and storage of biological simulation models.

by Yuhui Chen, Conor Lawless, Colin S Gillespie, Jake Wu, Richard J Boys, Darren J Wilkinson
Briefings in Bioinformatics (2010)

Abstract

Dynamic simulation modelling of complex biological processes forms the backbone of systems biology. Discrete stochastic models are particularly appropriate for describing sub-cellular molecular interactions, especially when critical molecular species are thought to be present at low copy-numbers. For example, these stochastic effects play an important role in models of human ageing, where ageing results from the long-term accumulation of random damage at various biological scales. Unfortunately, realistic stochastic simulation of discrete biological processes is highly computationally intensive, requiring specialist hardware, and can benefit greatly from parallel and distributed approaches to computation and analysis. For these reasons, we have developed the BASIS system for the simulation and storage of stochastic SBML models together with associated simulation results. This system is exposed as a set of web services to allow users to incorporate its simulation tools into their workflows. Parameter inference for stochastic models is also difficult and computationally expensive. The CaliBayes system provides a set of web services (together with an R package for consuming these and formatting data) which addresses this problem for SBML models. It uses a sequential Bayesian MCMC method, which is powerful and flexible, providing very rich information. However this approach is exceptionally computationally intensive and requires the use of a carefully designed architecture. Again, these tools are exposed as web services to allow users to take advantage of this system. In this article, we describe these two systems and demonstrate their integrated use with an example workflow to estimate the parameters of a simple model of Saccharomyces cerevisiae growth on agar plates.

Cite this document (BETA)

Available from Richard Boys, Colin Gillespie and Conor Lawless's profiles on Mendeley.
Page 1
hidden

CaliBayes and BASIS: integrated tools for the calibration, simulation and storage of biological simulation models.

CaliBayes and BASIS: integrated tools for
the calibration, simulation and storage of
biological simulation models
Yuhui Chen,2;3 Conor Lawless,2;3 Colin S. Gillespie,1;3
Jake Wu,1 Richard J. Boys,1;3 Darren J. Wilkinson1;3
1School of Mathematics & Statistics, Newcastle University, UK
2Institute for Ageing and Health, and SCMS - Gerontology, Newcastle University, UK
3Centre for Integrated Systems Biology of Ageing and Nutrition, Newcastle University, UK
Abstract
Dynamic simulation modelling of complex biological processes forms the backbone of systems bi-
ology. Discrete stochastic models are particularly appropriate for describing sub-cellular molec-
ular interactions, especially when critical molecular species are thought to be present at low
copy-numbers. For example, these stochastic e ects play an important role in models of human
ageing, where ageing results from the long-term accumulation of random damage at various
biological scales. Unfortunately, realistic stochastic simulation of discrete biological processes
is highly computationally intensive, requiring specialist hardware, and can bene t greatly from
parallel and distributed approaches to computation and analysis. For these reasons, we have
developed the BASIS system for the simulation and storage of stochastic SBML models together
with associated simulation results. This system is exposed as a set of web services to allow users
to incorporate its simulation tools into their work
ows.
Parameter inference for stochastic models is also dicult and computationally expensive. The
CaliBayes system provides a set of web services (together with an R package for consuming
these and formatting data) which addresses this problem for SBML models. It uses a sequential
Bayesian MCMC method which is powerful and
exible, providing very rich information. How-
ever this approach is exceptionally computationally intensive and requires the use of a carefully
designed architecture. Again, these tools are exposed as web services to allow users to take
advantage of this system.
In this paper we describe these two systems and demonstrate their integrated use with an
example work
ow to estimate the parameters of a simple model of S. Cerevisiae growth on
agar plates.
Keywords: Bayesian inference; distributed computing; SBML; stochastic models; web services
1
Page 2
hidden
INTRODUCTION
As a result of advances in experimental techniques, biology has become a much more quantita-
tive science. The capacity to answer questions ranging in scale from cell and molecular function
through to population dynamics requires an increasing ability to acquire, store, and manipulate
large volumes of raw data in a
exible, ecient manner. Moreover, there is a growing realization
that complex biological processes cannot be understood through the application of ever-more
reductionist experimental programmes.
Mathematical modelling can provide key insights into the biological mechanisms underpinning
much of the mass of biological data which is currently available. Indeed, there are distinct
advantages of modelling a biological process with the rigour needed to build a mathematical
model. For example, when constructing a model, gaps in current knowledge are highlighted [19,
27]. Even the very process of specifying a model can identify important quantities which remain
unknown or unobserved. Also qualitative verbal hypotheses are made quantitative, speci c
and conceptually rigorous [2, 8]. Further, models can yield quantitative as well as qualitative
predictions [6, 22].
For all of these reasons, there has been a need to develop systems which aid the modelling and
analysis components of the study of biological data. To this end, we have created a web service
based modelling system known as the Biology of Ageing e-Science Integration and Simulation
(BASIS) system [18]. The primary objective of BASIS is to help advance the understanding of
the complex biology of ageing, where many di erent mechanisms act and interact at a range of
di erent levels. Discrete stochastic simulation modelling is particularly relevant to biological in-
vestigation. For example, when considering sub-cellular models of biochemical processes where
critical species have low copy-number, discrete simulation deals with the qualitative di erence
between the complete absence of a molecule from a system, and its presence at very low levels.
This is unlike models based on the continuum approximation which often simply cannot deal
with the extinction of a species [21]. Stochastic simulation is generally appropriate for describing
biological systems as their intrinsic complexity often leads to small interactions (e.g. environ-
mental interactions or cell-cell interactions) which are not included explicitly in a model. Thus
stochastic models often provide an excellent framework for dealing with the combined e ect of
many unmodelled weak processes and interactions. Furthermore, these models encompass those
needed to describe and design time course biological experiments in which it is not possible to
have complete control over the initial conditions of the system (e.g. cell cycle synchrony or
isogenicity across a population of cells). Stochastic models play a particularly important role
in ageing research. In this area, ageing is typically described as the cumulative result of small
amounts of random damage and the propagation of this damage throughout the lifetime of a
cell or organism requires appropriate characterisation of random e ects, such as DNA damage,
protein damage or the accumulation of DNA mutations which lead to tumorogenesis.
Although BASIS has been designed with ageing research in mind, its capabilities are generic
to a wide range of other biological systems. Our system aims to make both existing and new
models accessible to the research community in a way that facilitates users to adapt models and
to run simulations themselves. It also makes publicly available a relatively complex, powerful
and expensive computational architecture that is necessary for inferring parameters in biological
stochastic models when using cohorts of simulation results. BASIS has adopted the Systems
Biology Markup Language [16] (SBML) as a model description language. This is an XML-based
2
Page 3
hidden
computer-readable format for representing models of biochemical reaction networks.
The BASIS project is supported by a team from a wide range of disciplines including the
biological sciences, mathematical and statistical sciences, and computer science. A key aim
is to facilitate collaboration between experimental scientists and mathematical modellers; see
[24] for an example. By sharing and integrating models and data, advances have already been
made in our understanding of ageing. BASIS also provides open source downloadable tools in
addition to a comprehensive online simulation system and modelling environment; see [9] for
details.
The CaliBayes project has a di erent focus and studies how to overcome the computational
diculties encountered when estimating parameter values in stochastic models. The CaliBayes
tools are also suitable for parameter inference in deterministic models observed with error. The
method works essentially through a stochastic comparison of simulated model output obtained
from di erent parameter values with what is often very noisy biological experimental data. The
project uses Bayesian methods to provide posterior distributions for parameter values which
describe uncertainty about their true values. These distributions provide a more natural repre-
sentation of knowledge about parameters than do, for example, point estimates obtained from
maximum likelihood or least-squares methods. Also Bayesian methods are particularly suited to
making inferences in complex stochastic biological models using partially observed time course
data [4]. Much biological data is partially observed whether it consists of a continuous process
measured at a few time points or whether key variables or components in the model are not ob-
served at all. The modelling framework we use allows for the underlying biological model to be a
deterministic model or a stochastic model. The observational (stochastic) error model describes
the (random) discrepancies between model outputs and experimental data. Put together, these
models describe a stochastic model for the experimental data and it is this overall stochastic
model that is calibrated to the data. Another feature of CaliBayes is that it allows prior infor-
mation about modelled parameters to be used to optimise inferences. This information can be
obtained from, for example, the literature or the analysis of previous similar experiments using
a simpli ed model structure and experimental measurement error. Additionally, distributional
information can be included about the initial levels of model variables.
The goal of this project is to provide a complete suite of tools necessary for performing Bayesian
parameter inference in biological models. These include: (i) an R package for formatting exper-
imental data and the user's prior beliefs about parameter values and initial conditions of model
variables (typically initial species values or concentrations); (ii) web service tools for forward
simulation of these models (deterministically, stochastically or by using hybrid methods) and
(iii) tools for parameter inference. The Calibayes R package (calibayesR) also provides an
optional interface to the CaliBayes web services within the R computing environment. Novel
methods for Bayesian inference in stochastic biological models have been developed and tested
by our group [11, 12, 13, 14]. The CaliBayes project exploits these techniques, together with
signi cant associated computing power, and makes them available for public access via web
services.
BASIS and CaliBayes are unusual in that they allow users to interact with these advanced
modelling facilities through a web service API. Furthermore, parts of the BASIS system can
be accessed through a user-friendly web interface (which utilises the same web services), or
downloaded and implemented locally.
The use of web service technology in biological modelling is gradually increasing along with
3
Page 4
hidden
the ambition of modellers to study ever larger and more complex systems, and in greater
detail, using more accurate modelling techniques. Although such systems currently typically
run on small clusters in academic institutions, looking ahead there is likely to be increasing use
made of large GRIDs and cloud computing approaches. Software systems exploring the use of
web service interfaces include popular simulation tools such as COPASI [15] and the Systems
Biology Workbench [26]. This is in addition to the use of web-based modelling systems [20]
that often exist primarily to provide cross-platform services that do not require downloading
and installation of software.
The software described in this paper is available from www.calibayes.ncl.ac.uk/Documentation
and www.basis.ncl.ac.uk/software.
THE BASIS SYSTEM
BASIS is a system for model de nition, simulation and visualisation for stochastic models
written in the SBML language. It is a collection of software tools running on a large cluster
of CPUs and is exposed through several web services (see Figure 1) which also drive a simple
web-based GUI. One tool we provide for using BASIS web services is a library of R functions
(basisR) and these functions access the web services directly. The web services interact with
a PostgreSQL database [23], which stores user information, models and simulation results, and
triggers simultaneous simulation jobs from di erent users. Simulation jobs are triggered via a
Condor [28] job scheduler, which eciently distributes parallel jobs (for cohorts of stochastic
simulations from a single run from a single user) across a 96-CPU Beowulf cluster. All details
of the underlying technology are hidden from the user.
To interact with the services that BASIS provides, a user must rst register: this is simply to
allow the user to retrieve their models and simulation results. A user can register either by
visiting the web-site [1] or by using the web service createUser. When registering, a valid
email address is required to discourage potential abuses of the system.
The majority of the web services provided by BASIS require a session ID as an argument. A
session ID is obtained with the getSessionId web service. As each user logs on, a unique valid
session ID is generated and returned to the user, with each being rendered invalid after several
hours of inactivity.
Initially when a user places an SBML model into the BASIS system, the model is designated
private and is only accessible by that user. However, a user can make their model public (after
publication, say) but once a model is made public, it can not be deleted.
Every model entered into the BASIS system is assigned a model Uniform Resource Name (URN)
as a unique identi er. The model URN has the form urn:basis.ncl:model:#1 where #1 is an
integer. A user can simulate from their model via the Gillespie algorithm by using a stochas-
tic simulator (called Gillespie2) which is built using the ecient GNU scienti c libraries and
libSBML [3]. The simulator currently supports local and global parameters, events (without
delays), assignment rules and randomly distributed parameters and species. It can be down-
loaded separately and installed on local machines if required [9]. One of the novel features of
BASIS is that when a model has been simulated on the system, the results are automatically
associated with that particular model. Therefore, when a model is made public, all its associ-
4
Page 5
hidden
Web Server
Web pages
Mainframe server / Primary database server
Primary database
Business Logic
Processing
Engine
Condor HTC
framework
Dependability
mechanisms
Simulator
Simulator
Simulator
Simulator
Condor pool
Application
server
Web services
Database Replica
Application
server
Web services
Database Replica
Application
server
Web services
Load
Banlancer
Database replica
Figure 1: BASIS Architecture
ated simulation data also become public. This allows users to share their results with others
and thereby reduce the overall computational load on the system (stochastic simulations are
generally much slower than deterministic ones).
Access to the simulator is again made via the web service interface. All simulation groups
(cohorts of stochastic simulations) are given a simulation group URN of the form
urn:basis.ncl:model:#1:simulation:#2-#3:#4
where #1 is an integer and refers to the model being simulated, #2 is the time of the nal
simulation point, #3 is the level of thinning used when storing the output, #4 is an integer
(ensures that the urn is unique) and #5 is an optional number referring to a speci c simulation.
So for example
urn:basis.ncl:model:401:simulation:100000-1000:418
refers to the simulation of model urn:basis.ncl:model:401 from time 0 to time 100000 and out-
putting values at times t = 0; 100; 200; : : : ; 100000.
Users can submit as many simulations as they need. However, there is no guarantee that they
will all run in parallel. User-speci c limitations on the number of simultaneous simulations have
been implemented and the rate at which jobs complete depends on the load experienced by the
computing cluster at any time. The job queue is managed by Condor [28].
5
Page 6
hidden
Web-service methods and portal
The web services provided by BASIS can be roughly split into three areas: user, model and
simulation services. The user services deal with the mundane but important matters such as
logging on to BASIS, changing passwords and retrieving a lost password. Such services include,
inter alia, createUser and getSessionId.
The model services allow users to obtain, submit and view SBML models. A few example
services are
 putSBML(sessionId, sbml)
Puts an SBML model into your private space. The model must be valid SBML.
 delSBML(sessionId, modelUrn)
Deletes the model from the database, together with any associated simulation data.
 getMyModelInfo(sessionId) Returns information regarding a user's private model space.
The simulation services deal with submitting a model and retrieving the results. When a model
is private, all associated results are also private. Example services are
 simulate(sessionId, modelUrn, runName, maxTime, no of sims, no of iters)
Sets o a stochastic simulation on the BASIS system. The service returns a simulation
urn.
 killSimulation(sessionId,simulationUrn)
Stops a simulation but does not delete the simulation data from the database.
The computer-readable WSDL le describing all the BASIS web services is available from
http://basis.ncl.ac.uk/basisJob.wsdl
In addition to web services, BASIS has a user-friendly web portal interface [1] which describes
the range of services available. After logging on, users are initially presented with the list of
models in their private model space. From this page, they can add, delete or simulate their
SBML models. A work
ow of how to use BASIS web services to store SBML models and to
forward simulate from within the R environment is shown as part of the diagram in Figure 3.
THE CALIBAYES SYSTEM
CaliBayes is a suite of tools for performing parameter inference in stochastic biological models
speci ed in SBML using experimental data. Its architecture is shown in Figure 2. Parameter
inference (or model calibration) for such models is typically performed by comparing discretely
observed time course experimental data with simulated results from a mathematical dynamic
simulation model describing the biological system of interest. Often these models are based
around sets of coupled di erential and algebraic equations (DAEs) and tted by using least-
squares. This process is equivalent to tting the stochastic model whose mean is the described
by the DAE and with independent and normally distributed errors. Typically these methods
are used only to provide point estimates of parameter values. CaliBayes, on the other hand,
uses such simulators to perform rapid parameter inference on this type of model (and other
inherently stochastic models) by using Bayesian sequential MCMC methods to obtain posterior
6
Page 7
hidden
Mainframe server / Primary database server
Calibayes API
(EJB)
RoadrunnerFern Copasi
BASIS
Simulators
Application
server
Web services
Application
server
Web services
Application
server
Web services
Load
Banlancer
Client
Java Client
Workflow
R Client
Workflow
Python Client
Workflow
Taverna
Workflow
Java Client
Python
Client
R Client
Taverna
Workflow
Figure 2: CaliBayes Architecture. A range of client software tools can interact with the Cal-
iBayes web services, which consist of a mainframe server running the CaliBayes al-
gorithm and many simulator engines
distributions which describe uncertainty about model parameter values. The hardware and
algorithms driving CaliBayes are made available via simple web services and an R library
(calibayesR). CaliBayes has been designed to be completely modular, and can utilise any
SBML-compliant simulation engine (deterministic or stochastic) via a transparent interface.
It is unique in its ease of use and the richness of the information it provides of relevance to
biological modelling.
Stochastic models are often preferred to deterministic models in a systems biology context
for several reasons. For example, environmental interactions and initial system states can be
imperfectly characterised or ignored in biological systems due to their complexity. Also, even
when it is possible to replicate initial conditions exactly, repeated experiments often produce
di erent outcomes. This inherent stochasticity can best be captured by using a stochastic
model.
Discrete e ects are particularly important in a systems biology context when modelling species
with low copy numbers, for example, on the molecular scale. In this situation, the discreteness
of the underlying biological process plays an important role in producing the experimental data.
This e ect is well known and is used in, for example, epidemiological models to capture the
qualitative di erence between the complete absence of a species (irreversible extinction) and its
presence at a very low level (reversible decline) [21]. Discrete stochastic models are a natural
way to describe the interaction of biochemical species [10], neatly capturing both stochasticity
and discreteness. The main simulation engines used by CaliBayes are COPASI [15], FERN [7]
7
Page 8
hidden
and BASIS and all contain implementations of the discrete stochastic Gillespie algorithm.
Posterior distributions describing uncertainty about parameter values are a natural output from
the Bayesian methods used to calibrate our stochastic models. These form an ideal summary
of the information about the parameters in the experimental data. The output also allows for
the testing of hypotheses such as whether there is evidence for di erences between parameter
values in di erent experiments. This contrasts with the output of other tting procedures as only
having point estimates of model parameter values severely restricts the role that modelling can
have in hypothesis testing. These posterior distributions for parameters also provide information
on identi ability and confounding; see [31] for further discussion.
CaliBayes uses sequential Bayesian MCMC methods [12] which are ideally suited to stochastic
kinetic models. Although there are other Bayesian MCMC tools for statistical models or closed-
form representations of dynamic models (e.g. OpenBUGS [29]), none of these can deal with
parameter inference for general DAEs or models described in SBML. These tools also do not
cater for discrete stochastic kinetic models.
The CaliBayes system is deployed, maintained and made publicly available on hardware based
at Newcastle University, UK as a low-powered example of its operation. However all of its
components are freely available for local deployment, and that is envisaged to be the primary
mode for its use. Thus CaliBayes can bene t from the availability of large amounts of computing
power and scales well to take advantage of available hardware.
CaliBayes software architecture
The CaliBayes software consists of a number of interacting service components. Each component
may be deployed on a di erent machine or all components on the same machine. For the publicly
available system, hosted at Newcastle, users can interact with these immediately by using web
services or by downloading the CaliBayes API [5]. The main calibration services are as follows.
CaliBayes simulator interface: CaliBayes makes use of third-party SBML-compliant sim-
ulators for forward simulation. Such simulators may be either deterministic or stochastic,
depending on the nature of the model to be calibrated. Any simulator can be used for this
purpose so long as a SOAP web services interface is also provided that conforms to the stan-
dard CaliBayes simulator interface. Example interfaces are given for COPASI (deterministic,
stochastic and hybrid) and FERN (stochastic, SBML assignment rules not allowed) simulators
in the publicly accessible demo system. The simulators are used for typically millions of short
simulations per CaliBayes job, and so powerful CPUs are useful.
CaliBayes calibration engine: the main back-end computational service implementing the
sequential Bayesian MCMC algorithm for model calibration. It is not intended that this service
is accessed directly by users. This engine typically initialises millions of short simulation runs via
the CaliBayes simulator interface, and therefore requires a wide bandwidth connection between
this machine and those running the simulators.
CaliBayes data integrator: the main user-level calibration service. This service allows the
calibration of a model based on multiple time series, each of which may consist of measurements
of di erent species or other model components and at di erent time points.
8
Page 9
hidden
In addition to the main CaliBayes software components, we have developed a support package
(calibayesR) for the R statistical programming language [25]. This package has been designed
to make it straightforward to generate, process and visualise the XML documents consumed
and produced by the CaliBayes services. It also includes functions for accessing the CaliBayes
web services directly from within the R environment. This allows users to take full advantage
of the graphical and statistical capabilities of R (which is freely available on all platforms) by
writing entire work
ows within R, including data formatting, prior generation and posterior
visualisation. Note that this does not preclude accessing the web services via any other tool
that the user nds more convenient (e.g. Python, Java or Taverna [17]).
A COMBINED BASIS AND CALIBAYES WORKFLOW
In this section we expand on the work
ow diagram in Figure 3 by giving the details of a work
ow
which integrates BASIS and CaliBayes web services to create and calibrate an SBML model
of the logistic growth of S. Cerevisiae colonies spotted onto solid agar plates. The CaliBayes
work
ow is simpler when using our calibayesR and basisR packages and so we describe the
work
ow within R. For brevity, we give below only the key parts of work
ow. The complete
work
ow can be found by using the command demo(YeastGrowthDemo) in the calibayesR
package.
Listing 1 describes the model in SBML-shorthand [9, 30]. This shorthand provides a very useful
human readable and editable model representation and can reliably convert to and from full
SBML.
Listing 1: The Logistic Model
@model : 2 . 1 . 2= Log i s t i c Model Yeast Spot Growth
#Model o f growth in photographed area o f S . Cerev i s i a e s po t t e d onto agar .
#Growth a r i s e s from the popu la t i on dynamics o f merging yeas t c o l on i e s .
@units
substance=item
@compartments
t i l e =1.0
@parameters
#Carrying capac i t y ( p i x e l s )
K
#Rate parameter ( per day )
r
@species
t i l e : S
@reactions
@r=Pr o l i f e r a t i o n
#Log i s t i c growth i s l i k e a u t o c a t a l y s i s
S!S+S
rS(1S/K)
This model describes the logistic growth of S. Cerevisiae (baker's yeast) spots growing on solid
agar plates. Spot size is represented by species S, and K and r are parameters for the spot
carrying capacity and growth rate respectively. We also assume that the data are normally
9
Page 10
hidden
R Workflow
BASIS R
package
CaliBayes Web ServicesBASIS Web Services
CaliBayes R
package
submit job
submit calibration job
return session ID
return session ID
check job status
check job status
return job status
return job status
Loop
retrieve result
retrieve result
return posterior distribution
return posterior distribution
User
mod model
convert mod to SBML
return SBML
return SBML
generate CaliBayes XML file
return CaliBayes XML file
generate settings file
return settings.xml
convert mod to SBML
SBML model
(modified)
Loop
submit
forward simulate
return session ID
return session ID
retrieve result
retrieve simulation result
return simulation result
return result
check job status
check job status
return job status
return job status
put SBML
return model URN
upload SBML model
return model URN
SBML model
(final)
Loop
Figure 3: Combined CaliBayes and BASIS work
ow
10
Page 11
hidden
Time (days)
Co
lon
y A
rea
(px
)
0
2500
5000
7500
10000
l l l l
l
l
l
l
l
l
l l l
l
0.0 2.5 5.0
Figure 4: Three experimental time courses of the growth of genetically identical S. Cerevisiae
spots on solid agar plates.
distributed about these model values with measurement error precision S.tau (variance =
1/S.tau).
The unknown quantities in the model are K, r, the initial spot size S(0) and the measurement
error precision S.tau. These are calibrated to the data as follows. First the data are split into
sequential batches, each containing observations at b time points. Simulating values from the
prior distribution for the unknown quantities and then forward simulating from the stochastic
model gives a distribution of values for S at the each of the time points in the rst batch.
Comparing this distribution with the observed values in the experimental data gives us infor-
mation on which values of the unknown quantities are more realistic (in a probabilistic sense).
Continuing this process by rst including data at the second batch and then the third (and so
on) eventually gives us the posterior distributions of these quantities calibrated to the whole
dataset. First the SBML-shorthand is converted to SBML by using our mod2sbml web service.
Then the calibayesR package is loaded to enable seamless access to the CaliBayes web services
from within the R environment:
#Convert the .mod string to an SBML model string
SBML = mod2sbml(LogisticMod , asText=TRUE)
We demonstrate the process by calibrating this model to the experimental dataset shown in
Figure 4. These data are growth measures (spot areas) determined from photographic images of
the plates and are included as a data frame in the calibayesR package. They can be accessed
after loading the SBMLModels package and calling data(LogisticModel). The information
describing prior uncertainty about the unknown quantities (shown in Figure 5) is represented
by similar data frames containing n samples from the prior distribution. Example code for K
and r is given below.
#Prior distributions for the two SBML model parameters
parameters = data.frame(K=rlnorm(n, 9, 0.2), r=rlnorm(n, 1, 0.6))
These data frames are converted to CaliBayes compliant XML strings by using the createCalibayes
11
Page 12
hidden
KDe
ns
ity
(
10

4 )
0
1
2
3
4000 8000 12000 16000
r
De
ns
ity
(
10
)
0
1
2
3
0 2 4 6 8 10 12
S(0)
De
ns
ity
(
10

2 )
0
2
4
6
0 10 20 30 40 50
S.tau
De
ns
ity
(
10
4 )
0
1
2
3
4
0e+00 2e-05 4e-05 6e-05 8e-05
Figure 5: Prior distribution of the spot carrying capacity K, the growth rate r, the initial spot
size S(0) and the error precision S.tau.
function:
#Create CaliBayes prior object
prior = createCalibayes(parameters , species ,
distributions , errors , YeastGrowth)
We now modify our prior distribution by incorporating information in the experimental data.
Before proceeding, an XML string is needed to describe MCMC tuning parameters such as
burn-in and thinning. Note that, as with all calibayesR functions which utilise CaliBayes web
services, this function requires a working WSDL address describing the location of the local
CaliBayes web services.
#Create MCMC tuning settings
tuning = createSettings(wsdl , burn =100, thin=10, block=3,
simulator="copasi -stochastic")
Now we are ready to start the CaliBayes engine. First the SBML model and the tuning and
prior objects are passed to the CaliBayes submit web service and this returns a session ID:
#Get Calibayes Session ID
SID = calibrate(wsdl , LogisticModel , tuning , prior)
This session ID is then used to repeatedly call the CaliBayes isCalibayesReady web service to
check if the job is complete:
isCalibayesReady(wsdl , SID)
Once this web service returns TRUE, we execute the getPosterior web service and receive an
XML document containing values from the posterior distribution:
12
Page 14
hidden
Time (days)
Co
lon
y A
re
a
(px
)
0
2500
5000
7500
10000
l l l l
l
l
l
l
l
l
l l l
l
0.0 2.5 5.0
Figure 7: The posterior predictive distribution of an experimental time course. At each time
point, the central 95% of the distribution is shown on a grey scale (by quantile), with
the darkest trace giving the path of the median response.
inherent stochasticity in the biological model and the measurement error. It can be obtained
by submitting the model, the posterior distribution for the quantities K, r and S.tau, and
the prior distribution for the initial species level S(0) and making repeated calls to the BASIS
forwardSimulate web service (as shown in the work
ow given in Figure 3). The posterior
predictive distribution for this model and data is given in Figure 7. It shows that the model
ts reasonably well to these data.
The results of this analysis, in the form of the model and xed values for the unknown quantities
(set at their median values in the posterior distribution) have been deposited on the BASIS
website and are available for use in the research community. These results were archived via
the basisR package by using the BASIS getSessionID web service and then the returned valid
session ID to submit the SBML model via the putSBML web service.
CONCLUSION
Stochastic simulation of biological processes is highly desirable as it can capture signi cant
intrinsic, unmodelled variation and environmental interactions in complex biological systems.
Also discrete simulation models are particularly useful for modelling biochemical reactions in
which species copy numbers are low since discrete e ects can be important at this low level.
Until now, discrete stochastic models have been computationally expensive to simulate, and pro-
hibitively expensive to use for parameter inference in this context, rendering these methodologies
slow and impractical. BASIS and CaliBayes are integrated systems for the rapid simulation of
cohorts of discrete stochastic realisations from SBML models, and for parameter inference for
these models based on Bayesian sequential MCMC algorithms, each deployed on a carefully
14
Page 15
hidden
constructed computational architecture. CaliBayes can also deal with continuous DAE mod-
els and provide posterior distributions for parameters assuming, for example, normal errors.
These systems have been designed to achieve a workload throughput which is suciently high
to provide practical and viable tools for discrete stochastic simulation and Bayesian inference
for parameters in these models. This is achieved by utilising a dedicated computer cluster
running simulation engines, supported by scheduling software, inference algorithms, databases
and high-speed network connections, all exposed to the public via web services. This complex,
technical environment is also made available to users (through the same web services) via simple
and easy-to-use R client libraries (calibayesR and basisR), and these libraries allow straight-
forward construction of prior distribution and plotting of posteriors and simulated results. In
this paper we have demonstrated how to use these packages to access CaliBayes web services
with a combined work
ow for inferring logistic equation parameter values from experimental
data describing the growth of S. Cerevisiae cultures on solid agar.
The BASIS and CaliBayes systems are computationally intensive. Currently at Newcastle,
BASIS is running on a cluster of 96 CPUs and CaliBayes is running on 32 CPUs. Both of these
systems can be scaled to service more users or perform more simulations (thereby reducing
queueing time for simulation jobs), and they scale linearly with available computing power.
All software components are freely available for local deployment, but a coordinated strategy
for sharing resources between academic institutions, for example, by distributing CaliBayes
and BASIS jobs across hardware on di erent sites worldwide, would best be achieved using
GRID technology. Another alternative strategy, which would move responsibility for hardware
maintenance away from academic institutions and improve reliability of service, would be to
utilise commercially available \Cloud" technologies such as Amazon's EC2. Given the limited
nancial resources of individual academic institutions and lack of long-term funding for research
projects, a Cloud-computing solution currently seems to be the most viable way to achieve ever
higher levels of throughput from services such as CaliBayes and BASIS. However, academic
funding models need to evolve somewhat before such an approach is likely to gain widespread
adoption.
Acknowledgement
This work was supported by the Biotechnology and Biological Sciences Research Council [grant
numbers BEP17042, BBSB16550, BBC0082001] with contributions from the EPSRC, MRC,
DTI and Unilever Corporate Research.
References
[1] BASIS. http://www.basis.ncl.ac.uk.
[2] D. Battogtokh and J. J. Tyson. Bifurcation analysis of a model of the budding yeast cell
cycle. Chaos, 14:653{661, 2004.
[3] B. J. Bornstein, S. M. Keating, A. Jouraku, and M. Hucka. LibSBML: an API library for
SBML. Bioinformatics, 24(6):880{881, 2008.
[4] R. J. Boys, D. J. Wilkinson, and T. B. L. Kirkwood. Bayesian inference for a discretely
observed stochastic kinetic model. Statistics and Computing, 18:125{135, 2008.
15
Page 16
hidden
[5] CaliBayes. http://www.calibayes.ncl.ac.uk/.
[6] K. C. Chen, L. Calzone, A. Csikasz-Nagy, F. R. Cross, B. Novak, and J. J. Tyson. In-
tegrative analysis of cell cycle control in budding yeast. Molecular Biology of the Cell,
15:3841{3862, 2004.
[7] F. Erhard, C. C. Friedel, and R. Zimmer. FERN - a Java framework for stochastic simu-
lation and evaluation of reaction networks. BMC Bioinformatics, 9:356, 2008.
[8] C. S. Gillespie, C. J. Proctor, D. P. Shanley, D. J. Wilkinson, R. J. Boys, and T. B. L.
Kirkwood. A mathematical model of ageing in yeast. Journal of Theoretical Biology,
229:189{196, 2004.
[9] C. S. Gillespie, D. P. Shanley, D. J. Wilkinson, R. J. Boys, C. J. Proctor, and T. B. L.
Kirkwood. Tools for the SBML community. Bioinformatics, 22:628{629, 2006.
[10] D. T. Gillespie. Exact stochastic simulation of coupled chemical reactions. Journal of
Physical Chemistry, 81:2340{2361, 1977.
[11] A. Golightly and D. J. Wilkinson. Bayesian inference for stochastic kinetic models using a
di usion approximation. Biometrics, 61(3):781{788, 2005.
[12] A. Golightly and D. J. Wilkinson. Bayesian sequential inference for stochastic kinetic
biochemical network models. Journal of Computational Biology, 13(3):838{851, 2006.
[13] A. Golightly and D. J. Wilkinson. Bayesian inference for nonlinear multivariate di usion
models observed with error. Computational Statistics and Data Analysis, 52(3):1674{1693,
2008.
[14] D. A. Henderson, R. J. Boys, K. J. Krishnan, C. Lawless, and D. J. Wilkinson. Bayesian
emulation and calibration of a stochastic computer model of mitochondrial DNA deletions
in substantia nigra neurons. Journal of the American Statistical Association, 104:76{87,
2009.
[15] S. Hoops, S Sahle, R. Gauges, C. Lee, J. Pahle, N Simus, M. Singhal, L. Xu, P Mendes, and
U. Kummer. COPASI - a COMplex PAthway SImulator. Bioinformatics, 22:3067{3074,
2006.
[16] M. Hucka, A. Finney, H. M. Sauro, H. Bolouri, J. C. Doyle, H. Kitano, A. P. Arkin,
B. J. Bornstein, D. Bray, A. Cornish-Bowden, A. A. Cuellar, S. Dronov, E. D. Gilles,
M. Ginkel, V. Gor, I. I. Goryanin, W. J. Hedley, T. C. Hodgman, J. H. Hofmeyr, P. J.
Hunter, N. S. Juty, J. L. Kasberger, A. Kremling, U. Kummer, N. Le Novre, L. M. Loew,
D. Lucio, P. Mendes, E. Minch, E. D. Mjolsness, Y. Nakayama, M. R. Nelson, P. F.
Nielsen, T. Sakurada, J. C. Scha , B. E. Shapiro, T. S. Shimizu, H. D. Spence, J. Stelling,
K. Takahashi, M. Tomita, J. Wagner, and J. and Wang. The Systems Biology Markup
Language (SBML): a medium for representation and exchange of biochemical network
models. Bioinformatics, 19:524{531, 2003.
[17] D. Hull, K. Wolstencroft, R. Stevens, C. Goble, M. Pocock, P. Li, and T. Oinn. Taverna: a
tool for building and running work
ows of services. Nucleic Acids Research, 34(Web Server
issue):729{732, July 2006.
[18] T. B. L. Kirkwood, R. J. Boys, C. S. Gillespie, C. J. Proctor, D.P. Shanley, and D. J.
Wilkinson. Towards an e-biology of ageing: integrating theory and data. Nature Reviews
Molecular Cell Biology, 4:243{249, 2003.
16
Page 17
hidden
[19] A. Kowald and T. B. L. Kirkwood. A network theory of ageing: the interactions of de-
fective mitochondria, aberrant proteins, free radicals and scavengers in the ageing process.
Mutation Research, 316:209{236, 1996.
[20] D. Lee, R. Saha, F. Khan, W. Park, and I. Karimi. Web-based applications for building,
managing and analysing kinetic models of biological systems. Brie ngs in Bioinformatics,
10, 2008.
[21] D. Mollison. Dependence of epidemic and population velocities on basic parameters. Math-
ematical Biosciences, 107:255{257, 1991.
[22] D. E. Nelson, A. E. Ihekwaba, M. Elliott, J. R. Johnson, C. A. Gibney, B. E. Foreman,
G. Nelson, V. See, C. A. Horton, D. G. Spiller, S. W. Edwards, H. P. McDowell, J. F.
Unitt, E. Sullivan, R. Grimley, N. Benson, D. Broomhead, D. B. Kell, and M. R. White.
Oscillations in NF-B signaling control the dynamics of gene expression. Science, 306:704{
708, 2004.
[23] PostgreSQL. http://www.postgresql.org.
[24] C. J. Proctor, C. Soti, R. J. Boys, C. S. Gillespie, D. P. Shanley, D. J. Wilkinson, and
T. B. L. Kirkwood. Modelling the actions of chaperones and their role in ageing. Mecha-
nisms of Ageing and Development, 126:119{131, 2005.
[25] R Development Core Team. R: A Language and Environment for Statistical Computing.
R Foundation for Statistical Computing, Vienna, Austria, 2009. ISBN 3-900051-07-0.
[26] H. M. Sauro, M. Hucka, A. Finney, C. Wellock, H. Bolouri, J. Doyle, and H. Kitano. Next
generation simulation tools: The Systems Biology Workbench and BioSPICE integration.
Omics: a Journal of Integrative Biology, 7:355{372, 2003.
[27] P. D. Sozou and T. B. L. Kirkwood. A stochastic model of cell replicative senescence based
on telomere shortening, oxidative stress, and somatic mutations in nuclear and mitochon-
drial DNA. Journal of Theoretical Biology, 213:573{586, 2001.
[28] D. Thain, T. Tannenbaum, and M. Livny. Distributed computing in practice: the condor
experience. Concurrency - Practice and Experience, 17(2-4):323{356, 2005.
[29] A. Thomas, B. O Hara, U. Ligges, and S. Sturtz. Making BUGS open. R News, 6:12{17,
2006.
[30] D. J. Wilkinson. Stochastic modelling for systems biology. Chapman & Hall/CRC Press,
2006.
[31] D. J. Wilkinson. Stochastic modelling for quantitative description of heterogeneous bio-
logical systems. Nature Reviews Genetics, 10(2):122{133, 2009.
17

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

19 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
26% Ph.D. Student
 
21% Researcher (at an Academic Institution)
 
16% Post Doc
by Country
 
32% United States
 
21% United Kingdom
 
11% Brazil