The SBML discrete stochastic models test suite.
- PubMed: 18025005
Abstract
MOTIVATION: Stochastic simulation is a very important tool for mathematical modelling. However, it is difficult to check the correctness of a stochastic simulator, since any two realizations from a single model will typically be different. RESULTS: We have developed a test suite of stochastic models that have been solved either analytically or using numerical methods. This allows the accuracy of stochastic simulators to be tested against known results. The test suite is already being used by a number of stochastic simulator developers. AVAILABILITY: The latest version of the test suite can be obtained from http://www.calibayes.ncl.ac.uk/Resources/dsmts/ and is licensed under GNU Lesser General Public License.
The SBML discrete stochastic models test suite.
Systems biology
The SBML discrete stochastic models test suite
Thomas W. Evans1, Colin S. Gillespie2,3 and Darren J. Wilkinson2,3,*
1Department of Mathematical Sciences, University of Liverpool, Liverpool, L69 7ZL, 2School of Mathematics &
Statistics, Newcastle University, Newcastle upon Tyne, NE1 7RU and 3Centre for Integrated Systems Biology of
Ageing and Nutrition (CISBAN), Newcastle University, UK
Received on September 17, 2007; revised on October 18, 2007; accepted on November 7, 2007
Advance Access publication November 19, 2007
Associate Editor: Thomas Lengauer
ABSTRACT
Motivation: Stochastic simulation is a very important tool for
mathematical modelling. However, it is difficult to check the
correctness of a stochastic simulator, since any two realizations
from a single model will typically be different.
Results: We have developed a test suite of stochastic models that
have been solved either analytically or using numerical methods.
This allows the accuracy of stochastic simulators to be tested
against known results. The test suite is already being used by a
number of stochastic simulator developers.
Availability: The latest version of the test suite can be obtained from
http://www.calibayes.ncl.ac.uk/Resources/dsmts/ and is licensed
under GNU Lesser General Public License.
Contact: D.J.Wilkinson@ncl.ac.uk
1 INTRODUCTION
In recent years, it has been increasingly recognized that
mathematical modelling can help us to understand complex
biological networks. As a result of this, the Systems Biology
Markup Language (SBML) was developed as a standard
format in which to represent the models (Hucka et al., 2003).
SBML is quickly becoming the lingua franca for the develop-
ment and sharing of biochemical network models.
One popular modelling technique is to use a discrete
stochastic kinetic framework. However, testing the correctness
of the implementation of the underlying algorithm is difficult,
since a stochastic simulator will by definition give you a
different realization for each run (for a different seed). This is
especially problematic since it is possible for two exact
algorithms, such as Gillespie’s direct method (Gillespie, 1977)
and the Next Reaction Method (Gibson and Bruck, 2000), to
have different implementations and to use random number
streams in an entirely different way.
A further complication in establishing the correctness of a
simulator arises from issues of interpretation of the SBML
model representation. The SBML specification contains little
guidance relating to the proper procedures to be followed in
encoding models intended for discrete stochastic simulation
(though the latest specification does contain an example),
leading to potential confusion. Indeed, SBML Level 1 was not
capable of encoding discrete stochastic kinetic models in a
correct, accurate and unambiguous way. Fortunately, SBML
Level 2 and beyond are quite capable in this regard. See the
discussion in Chapter 2 of Wilkinson (2006) for further details.
This article describes how the SBML discrete stochastic
models test suite (DSMTS) can be used to test a stochastic
simulator. Versions of the test suite exist for SBML Level 2,
versions 1 and 3.
2 TESTING A STOCHASTIC SIMULATOR
The only practical testing method is to run the simulator a large
number of times and check that the distribution of outcomes is
not significantly different from the true underlying distribution.
This can only be tested in a probabilistic way. The test suite is a
set of SBML models each with time course data for the means
and SDs of the model species. Developers may use the suite to
check that their simulators produce results that are consistent
with the SBML standard. The test suite assumes that the
simulator produces output on a regular time grid. Of course,
exact stochastic simulators naturally produce output on a non-
regular grid corresponding to individual reaction events [see
Wilkinson (2006) for further details on stochastic simulators].
However, this ‘step function’ output is easy to map onto a
regular time-grid either post hoc, or during the simulation run
itself.
In order to test the output from an exact stochastic simulator
for a given SBML model (Gillespie, 1977), n independent
simulation runs of the simulator should be performed. For the
statistical tests to have reasonable power to detect subtle
problems, n should be set to at least 10 000. The sample means
and SDs of the species amounts from the simulation runs at
t¼ 0,1,. . .,50 can be compared with the corresponding values in
the test suite using the statistical tests described below. Figure 1
shows the means and SDs over time for an example model that
includes an event. By comparing the output of many simula-
tions to the true value, we can test the stochastic simulator. The
simulated values of a particular species, X, can be tested as
follows: let X ðiÞt be the value of Xt on the i th run of the
simulator, where Xt is the random variable representing X at
time t. Put t¼E(Xt) and t ¼
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
VarðXtÞ
p
. Assuming that*To whom correspondence should be addressed.
Oxford University Press 2007. 285
we have
Xt Nðt; 2t =nÞ; where Xt ¼
1
n
Xn
i¼1
X ðiÞt ;
giving
Zt
ffiffiffi
n
p Xt t
t
Nð0; 1Þ:
So under the null hypothesis that the simulator is correct, the
Zt values should have a standard normal distribution. In this
case, most values will lie in the range (3, 3). Therefore, values
of Zt outside this range correspond to evidence that the
simulator is in error. The DSMTS User Guide also describes a
test for the SD.
Although the test suite is designed primarily for rigorous
testing of exact simulators, it should also prove useful to
developers of approximate simulators (Gillespie and Petzold,
2003) or hybrid simulators (Kiehl et al., 2004; Puchalka and
Kierzek, 2004). For example, one way of using the test suite to
assess the performance of an approximate simulator is to plot
the means and SDs as percentages of their true values.
The DSMTS currently uses variations of three simple
models:
The birth–death process [see Cox andMiller (1965) for
details];
The dimerization process (Wilkinson, 2006);
The batch immigration-death process, of which the classic
immigration-death process is a special case (Gillespie and
Renshaw, 2005).
The models can be solved either analytically or, in the
case of the dimerisation model, by using numerical linear
algebra. At the time of writing there are 36 variations of the
three models, and these test a variety of SBML features, such as
units, rate law interpretation, events and cell volumes. To run
the entire test suite takes 1 h, for n¼ 10 000 and a reasonably
fast simulator.
3 CONCLUSION
The test suite is already being employed by a number of
stochastic simulator developers. For example, the developers of
the Systems Biology Workbench (Hucka et al., 2002;
Vallabhajosyula and Sauro, 2007), the BASIS system
(Gillespie et al., 2006; Kirkwood et al., 2003) and COPASI
(Hoops et al., 2006) all use the test suite routinely. We have
found the test suite to be invaluable when developing our own
stochastic simulator, as it provides a simple and systematic
means with which to test many aspects of the simulator
behaviour.
ACKNOWLEDGEMENTS
Work on the DSMTS was partially funded by the BBSRC
through grants BEP 17042, BBS/B/16550, and BBC0082001.
Conflict of Interest: none declared.
REFERENCE
Cox,D.R. and Miller,H. D. (1965) The Theory of Stochastic Processes. Methuen,
London.
Gibson,M.A. and Bruck,J. (2000) Efficient exact stochastic simulation of
chemical systems with many species and many channels. J. Phys. Chem.,
104, 1876–1889.
Gillespie,C.S. and Renshaw,E. (2005) The evolution of a batch-immigration
death process subject to counts. Proc. R. Soc. A, 461, 1563–1581.
Gillespie,C.S. et al. (2006) Tools for the SBML community. Bioinformatics, 22,
628–629.
Gillespie,D.T. (1977) Exact stochastic simulation of coupled chemical reactions.
J. Phys. Chem., 81, 2340–2361.
Gillespie,D.T. and Petzold,L.R. (2003) Improved leap-size selection for acceler-
ated stochastic simulation. J. Chem. Phys., 119, 8229–8234.
Hoops,S. et al. (2006) COPASI – a COmplex PAthway SImulator. Bioinformatics,
22, 3067–3074.
Hucka,M. et al. (2002) The ERATO systems biology workbench: enabling
interaction and exchange between software tools for computational biology.
Pac. Symp. Biocomput., 450–461.
Hucka,M. et al. (2003) The systems biology markup language (SBML): a medium
for representation and exchange of biochemical network models.
Bioinformatics, 19, 524–531.
Kiehl,T.R. et al. (2004) Hybrid simulation of cellular behavior. Bioinformatics,
20, 316–322.
Kirkwood,T.B.L. et al. (2003) Towards an e-biology of ageing: integrating theory
and data. Nat. Rev. Mol. Cell Biol., 4, 243–249.
Puchalka,J. and Kierzek,A. M. (2004) Bridging the gap between stochastic and
deterministic regimes in the kinetic simulations of the biochemical reaction
networks. Biophys. J., 86, 1357–1372.
Vallabhajosyula,R.R. and Sauro,H.M. (2007) Stochastic simulation GUI for
biochemical networks. Bioinformatics, 23, 1859–1861.
Wilkinson,D.J. (2006) Stochastic Modelling for Systems Biology. Chapman &
Hall/CRC Press.
0 10 20 30 40 50
0
20
40
60
80
10
0
Time
0 10 20 30 40 50
Time
M
ea
n
Po
pu
la
tio
n
P
P2 0
1
2
3
4
5
6
St
an
da
rd
D
ev
ia
tio
n
P
P2
Fig. 1. This figure represents a simple dimerization model including
SBML ‘events’. When the time parameter t becomes greater than or
equal to 25, the species populations are reset to P¼ 100 and P2¼ 0.
T.W.Evans et al.
286
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime


