Sign up & Download
Sign in

Computational toxicology using the OpenTox application programming interface and Bioclipse.

by Egon L Willighagen, Nina Jeliazkova, Barry Hardy, Roland C Grafstrom, Ola Spjuth
BMC research notes (2011)

Abstract

ABSTRACT: BACKGROUND: Toxicity is a complex phenomenon involving the potential adverse effect on a range of biological functions. Predicting toxicity involves using a combination of experimental data (endpoints) and computational methods to generate a set of predictive models. Such models rely strongly on being able to integrate information from many sources. The required integration of biological and chemical information sources requires, however, a common language to express our knowledge ontologically, and interoperating services to build reliable predictive toxicology applications. FINDINGS: This article describes progress in extending the integrative bio- and cheminformatics platform Bioclipse to interoperate with OpenTox, a semantic web framework which supports open data exchange and toxicology model building. The Bioclipse workbench environment enables functionality from OpenTox web services and easy access to OpenTox resources for evaluating toxicity properties of query molecules. Relevant cases and interfaces based on ten neurotoxins are described to demonstrate the capabilities provided to the user. The integration takes advantage of semantic web technologies, thereby providing an open and simplifying communication standard. Additionally, the use of ontologies ensures proper interoperation and reliable integration of toxicity information from both experimental and computational sources. CONCLUSIONS: A novel computational toxicity assessment platform was generated from integration of two open science platforms related to toxicology: Bioclipse, that combines a rich scriptable and graphical workbench environment for integration of diverse sets of information sources, and OpenTox, a platform for interoperable toxicology data and computational services. The combination provides improved reliability and operability for handling large data sets by the use of the Open Standards from the OpenTox Application Programming Interface. This enables simultaneous access to a variety of distributed predictive toxicology databases, and algorithm and model resources, taking advantage of the Bioclipse workbench handling the technical layers.

Cite this document (BETA)

Available from Egon Willighagen's profile on Mendeley.
Page 1
hidden

Computational toxicology using the OpenTox application programming interface and Bioclipse.

This Provisional PDF corresponds to the article as it appeared upon acceptance. Fully formatted
PDF and full text (HTML) versions will be made available soon.
Computational toxicology using the OpenTox application programming interface
and Bioclipse
BMC Research Notes 2011, 4:487 doi:10.1186/1756-0500-4-487
Egon L Willighagen (egon.willighagen@gmail.com)
Nina Jeliazkova (jeliazkova.nina@gmail.com)
Barry Hardy (barry.hardy@douglasconnect.com)
Roland C Grafstrom (roland.grafstrom@ki.se)
Ola Spjuth (ola.spjuth@farmbio.uu.se)
ISSN 1756-0500
Article type Short Report
Submission date 20 August 2011
Acceptance date 10 November 2011
Publication date 10 November 2011
Article URL http://www.biomedcentral.com/1756-0500/4/487
This peer-reviewed article was published immediately upon acceptance. It can be downloaded,
printed and distributed freely for any purposes (see copyright notice below).
Articles in BMC Research Notes are listed in PubMed and archived at PubMed Central.
For information about publishing your research in BMC Research Notes or any BioMed Central
journal, go to
http://www.biomedcentral.com/info/instructions/
BMC Research Notes
© 2011 Willighagen et al. ; licensee BioMed Central Ltd.
This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0),
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Page 2
hidden
Computational toxicology using the OpenTox application
programming interface and Bioclipse
Egon L Willighagen¤1;2, Nina Jeliazkova3 , Barry Hardy4, Roland C GrafstrÄom2;5, Ola Spjuth1
1 Department of Pharmaceutical Bioinformatics, Uppsala University, Uppsala, Sweden
2 Division of Molecular Toxicology, Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden
3 Ideaconsult Ltd, A. Kanchev 4, So¯a 1000, Bulgaria
4 Douglas Connect, Baermeggenweg 14, 4314 Zeiningen, Switzerland
5 VTT Technical Research Center of Finland, Medical Biotechnology, FI-20521 Turku, Finland,
Email: Egon L Willighagen¤- egon.willighagen@gmail.com; Nina Jeliazkova - jeliazkova.nina@gmail.com; Barry Hardy -
barry.hardy@douglasconnect.com; Roland C GrafstrÄom - roland.grafstrom@ki.se; Ola Spjuth - ola.spjuth@farmbio.uu.se;
¤Corresponding author
Abstract
Background: Toxicity is a complex phenomenon involving the potential adverse e®ect on a range of biological
functions. Predicting toxicity involves using a combination of experimental data (endpoints) and computational
methods to generate a set of predictive models. Such models rely strongly on being able to integrate
information from many sources. The required integration of biological and chemical information sources
requires, however, a common language to express our knowledge ontologically, and interoperating services to
build reliable predictive toxicology applications.
Findings: This article describes progress in extending the integrative bio- and cheminformatics platform
Bioclipse to interoperate with OpenTox, a semantic web framework which supports open data exchange and
toxicology model building. The Bioclipse workbench environment enables functionality from OpenTox web
services and easy access to OpenTox resources for evaluating toxicity properties of query molecules. Relevant
cases and interfaces based on ten neurotoxins are described to demonstrate the capabilities provided to the user.
The integration takes advantage of semantic web technologies, thereby providing an open and simplifying
communication standard. Additionally, the use of ontologies ensures proper interoperation and reliable
integration of toxicity information from both experimental and computational sources.
1
Page 3
hidden
Conclusions: A novel computational toxicity assessment platform was generated from integration of two open
science platforms related to toxicology: Bioclipse, that combines a rich scriptable and graphical workbench
environment for integration of diverse sets of information sources, and OpenTox, a platform for interoperable
toxicology data and computational services. The combination provides improved reliability and operability for
handling large data sets by the use of the Open Standards from the OpenTox Application Programming
Interface. This enables simultaneous access to a variety of distributed predictive toxicology databases, and
algorithm and model resources, taking advantage of the Bioclipse workbench handling the technical layers.
Findings
We here report the establishment of a new interoperable platform for computational toxicology that is able
to dynamically discover computational services running the latest predictive algorithms and models, while
hiding technicalities by reusing a graphics-oriented workbench for the life sciences. The OECD QSAR
ToolBox [1,2] and ToxTree [3,4] are existing softwares that aggregate predictive toxicity models, but do not
integrate with other functionality easily, such as online services. Bioclipse, however, is designed to integrate
local and remote functionality [5{7]. In this paper we outline how we implemented a new platform,
integrating the OpenTox Open Standards [8] and the interactive, but scriptable Open Source workbench
for the life sciences, Bioclipse. This approach makes it possible for anyone to make new computational
toxicology models available to Bioclipse without the need to change the software source code.
Predictive toxicology is a ¯eld where knowledge from many sources needs to be integrated to provide a
weight of evidence on the toxicity of untested chemical compounds. Typical sources of information include
databases with in vivo and in vitro experimental data such as ToxCast and SuperToxic [9, 10], literature
databases summarizing adverse reactions like SIDER [11], and computational resources based on toxicity
data for other compounds including DSSTox [12]. Importantly, this information should be visualized,
preferably linked to the chemical structure of the compound, or by visualizing relevant life science data,
such as gene, protein and biological pathway information [13{15] or metabolic reactions [16]. Bioclipse was
designed to provide such interactive data analysis for the life sciences.
Moreover, predictive toxicology is an advancing science, aiming to develop new alternative testing methods,
2
Page 4
hidden
satisfying the demanding risk assessment requirements of the European REACH guidance [17]. The
dynamic discovery of new toxicology-related data and computational methods is therefore of utmost
scienti¯c and practical importance. The EU FP7 OpenTox project recently developed a framework to
enable the feasibility of semantic integration of such new resources [8].
We describe here the subsequent technological interoperation of Bioclipse and the OpenTox platform, such
as implemented by the AMBIT software [18]. This short report outlines what functionality the new
combined platform provides to the toxicologist and what development is ongoing. At the core of the
interoperation lies the use of the Resource Description Framework (RDF) [19] and related Open Standards.
OpenTox uses RDF as a primary exchange format and the RDF query language SPARQL [20] to discover
data sets, algorithms and models. Bioclipse was recently extended to support these standards [21],
simplifying the interoperation task with OpenTox.
We outline three applications that exemplify how the various used technologies make this interoperability
possible, starting with a computational toxicology example. Advantage is taken of three technologies that
drive the interoperability. First, it uses the SPARQL RDF query language to discover functionality on the
OpenTox network. Secondly, it uses the OpenTox web services for remote computation. Finally, all
graphical user interfaces use a new Bioclipse Scripting Language (BSL) [6] extension to interact with
OpenTox servers, allowing all interaction to be scripted and automated too.
Computational Toxicology
Figure 1 shows how the interoperability of Bioclipse with the OpenTox API is designed, and in particular
how it was used to extend the molecular descriptor calculation functionality in Bioclipse described
previously [22]. This functionality can be use to calculate properties such as logP and pKa, important to
various aspects of toxicity, including membrane transport and receptor binding. Knowledge about such
properties can be used under the European REACH regulation. For example, predicted physical and
chemical properties can, under certain conditions, complement toxicity testing using animal experiments,
and as such, calculation of such descriptors is increasingly relevant.
Bioclipse dynamically discovers descriptor algorithms exposed via the OpenTox servers, using the OpenTox
ontology service's SPARQL endpoint. This SPARQL endpoint functions as a registry of available
computational services on the OpenTox network, similar to the role of BioCatalogue [23]. These services
are described with the OpenTox ontology, which is available as Web Ontology Language [24] document at
http://opentox.org/api/1 1/opentox.owl and discussed in detail in reference [8]. Using the SPARQL query
3
Page 5
hidden
language Bioclipse can retrieve a list of available services. Moreover, when a new descriptor algorithm or
model is registered on the OpenTox ontology service, it will automatically be picked up by Bioclipse.
Figure 2 shows several discovered OpenTox descriptor algorithms, along with algorithms from other local
(CDK [25]) and remote (CDK REST ) providers. Using this approach, Bioclipse has access to the most
recent descriptors relevant to toxicity predictions.
OpenTox provides web services to calculate a descriptor value for a given molecule. Using the linked
resources idea of the semantic web, the descriptors discovered via the ontology server can be invoked via
Bioclipse directly. As such, OpenTox-provided descriptor calculations can be mixed with descriptor
calculations local to Bioclipse, or from other remote computational services, as described before [22]. This
creates a °exible application for the integration of numerical input for statistical modeling of
toxicologically relevant end points, as well as the comparison of various predictive models for a more
balanced property analysis.
All functionality for remote computing on the OpenTox network is also available as BSL scripting
commands, allowing all OpenTox interoperation with the Bioclipse graphical user interface to be replicated
using BSL scripts. Table 1 shows the BSL commands for service and data discovery and the invocation of
remote services, under the categories Querying and Computation, respectively.
Data Sharing
Using a second, data sharing use case we will explain how all graphical interoperation is using a BSL script
extension. For example, Figure 3 shows the Bioclipse dialog for uploading a small data set with ten
neurotoxins to an OpenTox server (see Additional ¯le 1). This dialog asks which OpenTox server to upload
to (the Ambit2 server is selected, http://apps.ideaconsult.net:8080/ambit2/), a title under which this data
set will be available (\Ten neurotoxins found in Wikipedia"), and the data license or waiver under which
the data will be available to others. Figure 3 indicates that the Creative Commons Zero waiver [26] was
selected. Other options include the ODC Public Domain Dedication and Licence [27], Open Database
License [28], and the Open Data Commons Attribution License [29]. Optionally, the user can specify a web
location for a custom license agreement under which the data is available, though we encourage users to
select a standard license.
Technically, the dialog makes use of the script commands createDataSet(service, molecules),
setDatasetLicense(datasetURI, licenseURI), and setDatasetTitle(datasetURI, title) (see Table 1). The
latter two methods use the data set Universal Resource Identi¯er (URI) returned by the ¯rst method.
4
Page 6
hidden
When the upload has ¯nished, the resulting OpenTox web page is opened in a browser window in Bioclipse
(see Figure 4).
This use case shows nicely how the Bioclipse-OpenTox integration takes advantage of the fact that
Bioclipse has all graphical user interface (GUI) functionality matched by a scripted equivalent. The use of
the BSL directly, allows interaction with the OpenTox network to be automated, combined with other
Bioclipse functionality into larger work°ows, and makes it easier to share procedures with others, using
social scienti¯c sites like MyExperiment [30]. An example BSL script for calculating molecular descriptors
combines OpenTox functionality with cheminformatics functionality provided by the cdk script extensions
(also available as Additional ¯le 2):
// requires an unspecified Bioclipse development version
// bioclipse.requireVersion("2.6")
service = "http://apps.ideaconsult.net:8080/ambit2/";
serviceSPARQL = "http://apps.ideaconsult.net:8080/ontology/";
stringMat = opentox.listDescriptors(serviceSPARQL);
stringMat.getColumn("algo"); // returns the descriptor services
stringMat.getColumn("desc"); // returns the BO entries
descriptor = stringMat.get(1,1);
molecules = cdk.createMoleculeList();
molecules.add(
cdk.fromSMILES("CC(=O)C1=CC=C(C=C1)N")
);
molecules.add(
cdk.fromSMILES("C1=CC=C(C(=C1)CC(=O)O)NC2=C(C=CC=C2C1)C1")
);
js.say(
descriptor + " - " +
opentox.calculateDescriptor(service, descriptor, molecules)
);
This will generate the following output to the JavaScript console:
http://apps.ideaconsult.net:8080/ambit2/algorithm/org.openscience.cdk.qsar.descriptors.-
molecular.XLogPDescriptor - [0.11900000274181366, 2.2190001010894775]
Table 1 shows an overview of the available BSL commands for uploading data to and downloading data
from OpenTox servers under the heading Data exchange.
5
Page 7
hidden
Authentication
The third demonstration of Bioclipse-OpenTox interoperability is the support for accessing protected
resources within the OpenTox network. Despite preferences of the authors, we acknowledge that not all
scienti¯c data will be Open Data. As such, authentication and authorization (A&A) are important features
of data access. OpenTox implements both aspects, and provides web services for A&A, allowing users to
log in and out of OpenTox applications, accompanied by policy-based speci¯cation of OpenTox resource
access permissions. Additionally, the same mechanism is used to restrict the access to calculation
procedures, allowing to expose software with commercial licenses as protected OpenTox resources.
Bioclipse was extended to support the OpenTox authentication, allowing the OpenTox servers to properly
authorize the user access to particular web services and data sets. The OpenTox account information is
registered with Bioclipse' keyring system, centralizing logging in and out onto remote services, providing
the graphical user interface for adding a new OpenTox account and to log in and out. The corresponding
script commands for the authentication are given in Authentication category in Table 1. Interested people
can create a free account at http://opentox.org/join form.
Discussion
We have described here an interoperability advance, enabling users to interactively explore and evaluate the
toxicity properties of molecules based on a semantic web approach to toxicology resources. The integration
into Bioclipse makes various components of the OpenTox platform available to the user, both via the
graphical user interface as well as via the Bioclipse Scripting Language. The Bioclipse-OpenTox plugin
makes it possible to upload data sets to and download them from any OpenTox server, calculate molecular
descriptors, and apply predictive toxicology models on molecular structures. All functionality has support
for user authentication using the OpenTox-adopted OpenSSO technology. Other components of OpenTox,
like model building and validation, have not been added yet, as Bioclipse currently does not have a clear
GUI for such functionality yet. Such functionality is being worked on, but outside the scope of this report.
The presented aspects make this integration fairly unique; creating a solution which is capable of
dynamically discovering new services in the OpenTox network when it starts, which di®erentiates the
software from specialized software like ToxTree and the OECD QSAR ToolBox. These tools aggregate
several predictive models, but need to be updated manually by the developers for each new model.
However, it is noted that these tools can also be extended to support the OpenTox platform.
An added value is that updates to computational modules are only done on the server side, so that the
6
Page 8
hidden
client software (Bioclipse) does not need to be updated; a feature in common with web-based solutions like
ToxPredict [31]. The scripting functionality makes it easy to automate data work°ows as do work°ow
applications such as Taverna [32] and KNIME (http://knime.org), but the combination with the rich
Bioclipse user interface makes it possible at the same time to work with OpenTox interactively.
The calculation results are cached by the OpenTox dataset service, allowing to avoid time consuming
processing if the same calculation on the same dataset is requested more than once. Users of the integrated
Bioclipse-OpenTox environment do not, therefore, need to care about the performance on their own
computer, though we are also exploring the options to have Bioclipse itself run an OpenTox server. The
latter is technically possible, and would convert the integrated platform into a standalone application that
does not require web access.
From a technological perspective, the Bioclipse-OpenTox integration relies on semantic web technologies,
which are seeing signi¯cant adoption in other areas of the life sciences too, including drug discovery, text
mining, and neurosciences [33{35]. The OpenTox platform demonstrated the provision of a simple but
well-de¯ned and consistent ontology for the interaction with their services, providing functionality for both
service discovery and service invocation. The SADI framework is the only known semantic alternative [36],
but does currently not provide the same level of computational toxicology services as OpenTox does.
However, while the integration is greatly simpli¯ed and semantically de¯nes what services are available and
do, the used technologies do neither solve the problem of the chemical validity of the molecular structures
that are sent around, nor does it semantically de¯ne and specify in detail how to interpret the
computational results of toxicity predictions. The ¯rst problem refers to the problem that even with
explicit meaning we can make incorrect claims. For example, we can always de¯ne a triple stating that
:water :isToxicAtLowConcentrationsTo :human, by using ontologies for all aspects, but that would not
make it true. Semantic technologies are not about correctness. Instead, they make it much easier to ¯nd
inconsistencies between knowledge bases. The same argument applies to semantically marked up molecular
structures and other data passed between Bioclipse and the OpenTox cloud (cf. Figure 1).
An example of the second problem is that various services can indicate that a compound is mutagenic or
carcinogenic, but express that statement in di®erent ways. One service may return a binary yes/no answer,
while another returns a more detailed answer, such as for which cell line or organism the prediction is
made. Such semantic integration is currently outside the scope of this Bioclipse-OpenTox interoperability,
but it is not a problem unique to our approach either.
To address these issues, the community needs to develop better capabilities to link automatically and
7
Page 9
hidden
reliably the various concepts in toxicology, such as links between chemical names and structures and links
to toxicities based on current biological knowledge on e®ects, targets and pathways. The platform is ready
for such semantic integration, but the community needs to develop a common language, which will be
enabled through the creation of a public set of linked, harmonized and interoperable ontologies satisfying
the predictive toxicology use cases of the future, supporting an integrated data analysis.
Availability and requirements
² Project name: Bioclipe-OpenTox
² Project home page: http://www.bioclipse.net/opentox/
² Operating system(s): Platform independent
² Programming language: Java
² Other requirements: Java 6 or higher
² License: Eclipse Public License
² Any restrictions to use by non-academics: None
List of abbreviations
A&A: Authorization and Authentication; API: Application Programming Interface; BSL: Bioclipse
Scripting Language; CDK: Chemistry Development Kit; EU: European Union; FP7: Seventh Framework
Programme; GUI: Graphical User Interface; OECD: Organisation for Economic Co-operation and
Development; QSAR: Quantitative Structure-Activity Relationship; RDF: Resource Description
Framework; REACH: Registration, Evaluation, Authorisation and Restriction of Chemical substances;
REST: Representational State Transfer; SPARQL: SPARQL Protocol and RDF Query Language; URI:
Uniform Resource Identi¯er.
Competing interests
OS declares interest in Genetta Soft AB, Sweden. NJ declares interest in Ideaconsult Ltd., Bulgaria.
8
Page 10
hidden
Authors' contributions
EW initiated the project at Uppsala University. OS and EW integrated the two platforms. NJ worked on
OpenTox to improve internal consistency. BH and RG encouraged and discussed the work with co-authors
and users. All authors contributed to the writing of the paper and approved the ¯nal version.
Acknowledgements
This research was funded by a KoF grant from Uppsala University (KoF 07), the Swedish VR-M
(04X-05957), the Swedish Cancer and Allergy Fund, the Swedish Research Council, the Swedish Fund for
Research without Animal Experiments, OpenTox through the EU Seventh Framework Programme
HEALTH-2007-1.3-3 (Health-F5-2008-200787), COLIPA, and ToxBank through the EU Seventh
Framework Programme HEALTH-2010-4.2.9 Alternative Testing Strategies (Health-F5-2010-267042).
Jonathan Alvarsson is acknowledged for his Bioclipse keyring extension which is used for the OpenTox
authentication integration.
References
1. Diderichs R: Tools for Category Formation and Read-Across: Overview of the OECD (Q)SAR Application
Toolbox, The Royal Society of Chemistry 2010 chap. 16, :385{407.
2. QSAR ToolBox[http://www.qsartoolbox.org/].
3. Patlewicz G, Jeliazkova N, Sa®ord RJ, Worth AP, Aleksiev B: An evaluation of the implementation of
the Cramer classi¯cation scheme in the Toxtree software. SAR and QSAR in environmental research
2008, 19(5-6):495{524.
4. ToxTree[http://toxtree.sourceforge.net/].
5. Spjuth O, Helmus T, Willighagen E, Kuhn S, Eklund M, Wagener J, Rust PM, Steinbeck C, Wikberg J:
Bioclipse: An open source workbench for chemo- and bioinformatics. BMC Bioinformatics 2007, 8.
6. Spjuth O, Alvarsson J, Berg A, Eklund M, Kuhn S, MÄasak C, Torrance G, Wagener J, Willighagen E,
Steinbeck C, Wikberg J: Bioclipse 2: A scriptable integration platform for the life sciences. BMC
Bioinformatics 2009, 10:397.
7. Spjuth O, Eklund M, Ahlberg Helgee E, Boyer S, Carlsson L: Integrated Decision Support for Assessing
Chemical Liabilities. Journal of Chemical Information and Modeling 2011, 51(8):1840{1847.
8. Hardy B, Douglas N, Helma C, Rautenberg M, Jeliazkova N, Jeliazkov V, Nikolova I, Benigni R,
Tcheremenskaia O, Kramer S, Girschick T, Buchwald F, Wicker J, Karwath A, Gutlein M, Maunz A, Sarimveis
H, Melagraki G, Afantitis A, Sopasakis P, Gallagher D, Poroikov V, Filimonov D, Zakharov A, Lagunin A,
Gloriozova Ta, Novikov S, Skvortsova N, Druzhilovsky D, Chawla S, Ghosh I, Ray S, Patel H, Escher S:
Collaborative development of predictive toxicology applications. Journal of Cheminformatics 2010,
2:7.
9. Knudsen TB, Houck KA, Sipes NS, Singh AV, Judson RS, Martin MT, Weissman A, Kleinstreuer NC,
Mortensen HM, Reif DM, Rabinowitz JR, Setzer RW, Richard AM, Dix DJ, Kavlock RJ: Activity pro¯les of
309 ToxCastTM chemicals evaluated across 292 biochemical targets. Toxicology 2011, 282(1-2):1{15.
10. Schmidt U, Struck S, Gruening B, Hossbach J, Jaeger IS, Parol R, Lindequist U, Teuscher E, Preissner R:
SuperToxic: a comprehensive database of toxic compounds. Nucleic Acids Research 2009, 37(suppl
1):D295{D299.
9
Page 11
hidden
11. Kuhn M, Campillos M, Letunic I, Jensen LJJ, Bork P: A side e®ect resource to capture phenotypic
e®ects of drugs. Molecular systems biology 2010, 6(343).
12. Williams-DeVane CR, Wolf MA, Richard AM: DSSTox chemical-index ¯les for exposure-related
experiments in ArrayExpress and Gene Expression Omnibus: enabling toxico-chemogenomics
data linkages. Bioinformatics 2009, 25(5):692{694.
13. Huang DW, Sherman BT, Lempicki RA: Systematic and integrative analysis of large gene lists using
DAVID bioinformatics resources. Nature protocols 2009, 4:44{57.
14. Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M: KEGG: Kyoto Encyclopedia of Genes and
Genomes. Nucleic acids research 1999, 27:29{34.
15. Kelder T, Pico AR, Hanspers K, van Iersel MP, Evelo C, Conklin BR: Mining Biological Pathways Using
WikiPathways Web Services. PLoS ONE 2009, 4(7):e6447+.
16. Rydberg P, Gloriam DE, Olsen L: The SMARTCyp cytochrome P450 metabolism prediction server.
Bioinformatics 2010, 26(23):2988{2989.
17. European Parliament C: Regulation (EC) No 1907/2006 of the European Parliament and of the
Council of 18 December 2006 concerning the Registration, Evaluation, Authorisation and
Restriction of Chemicals (REACH), establishing a European Chemicals Agency, amending
Directive 1999/45/EC and repealing Council Regulation (EEC) No 793/93 and Commission
Regulation (EC) No 1488/94 as well as Council Directive 76/769/EEC and Commission
Directives 91/155/EEC, 93/67/EEC, 93/105/EC and 2000/21/EC. Tech. rep. 2006,
[http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=CELEX:32006R1907:en:NOT].
18. Jeliazkova N, Jeliazkov V: AMBIT RESTful web services: an implementation of the OpenTox
application programming interface. Journal of Cheminformatics 2011, 3:18.
19. Carroll JJ, Klyne G: Resource Description Framework (RDF): Concepts and Abstract Syntax 2004,
[http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/].
20. Prud'hommeaux E, Seaborne A: SPARQL Query Language for RDF. Tech. rep., World-Wide-Web
Consortium 2008, [http://www.w3.org/TR/rdf-sparql-query/].
21. Willighagen E, Alvarsson J, Andersson A, Eklund M, Lampa S, Lapins M, Spjuth O, Wikberg J: Linking the
Resource Description Framework to cheminformatics and proteochemometrics. Journal of
Biomedical Semantics 2011, 2(Suppl 1):S6.
22. Spjuth O, Willighagen E, Guha R, Eklund M, Wikberg J: Towards interoperable and reproducible
QSAR analyses: Exchange of datasets. Journal of Cheminformatics 2010, 2:5.
23. Bhagat J, Tanoh F, Nzuobontane E, Laurent T, Orlowski J, Roos M, Wolstencroft K, Aleksejevs S, Stevens R,
Pettifer S, Lopez R, Goble CA: BioCatalogue: a universal catalogue of web services for the life
sciences. Nucleic Acids Research 2010, 38(suppl 2):W689{W694, [http://dx.doi.org/10.1093/nar/gkq394].
24. W3C OWL Working Group: OWL 2 Web Ontology Language Document Overview. Tech. rep., W3C
2009. [Http://www.w3.org/TR/2009/REC-owl2-overview-20091027/].
25. Steinbeck C, Hoppe C, Kuhn S, Floris M, Guha R, Willighagen EL: Recent developments of the
chemistry development kit (CDK) - an open-source java library for chemo- and bioinformatics.
Current pharmaceutical design 2006, 12(17):2111{2120.
26. CC0 1.0 Universal Public Domain Dedication[http://creativecommons.org/publicdomain/zero/1.0/].
27. ODC Public Domain Dedication and Licence
1.0[http://www.opendatacommons.org/licenses/pddl/1-0/].
28. Open Database License 1.0[http://opendatacommons.org/licenses/odbl/1.0/].
29. Open Data Commons Attribution License 1.0[http://opendatacommons.org/licenses/by/1.0/].
30. Goble CA, Bhagat J, Aleksejevs S, Cruickshank D, Michaelides D, Newman D, Borkum M, Bechhofer S, Roos
M, Li P, De Roure D: myExperiment: a repository and social network for the sharing of
bioinformatics work°ows. Nucleic Acids Res 2010, 38 Suppl:W677{82.
31. Ideaconsult Ltd: ToxPredict. http:// toxpredict.org/ .
10
Page 12
hidden
32. Oinn T, Addis M, Ferris J, Marvin D, Senger M, Greenwood M, Carver T, Glover K, Pocock MR, Wipat A, Li
P: Taverna: a tool for the composition and enactment of bioinformatics work°ows. Bioinformatics
2004, 20(17):3045{3054.
33. Ruttenberg A, Clark T, Bug W, Samwald M, Bodenreider O, Chen H, Doherty D, Forsberg K, Gao Y, Kashyap
V, Kinoshita J, Luciano J, Marshall MS, Ogbuji C, Rees J, Stephens S, Wong G, Wu E, Zaccagnini D,
Hongsermeier T, Neumann E, Herman I, Cheung KH: Advancing translational research with the
Semantic Web. BMC Bioinformatics 2007, 8(Suppl 3):S2.
34. Splendiani A, Burger A, Paschke A, Romano P, Marshall M: Biomedical semantics in the Semantic Web.
Journal of Biomedical Semantics 2011, 2(Suppl 1):S1.
35. Willighagen EL, BrÄandle MP: Resource description framework technologies in chemistry. Journal of
Cheminformatics 2011, 3:15.
36. Chepelev L, Dumontier M: Semantic Web integration of cheminformatics resources with the SADI
framework. Journal of Cheminformatics 2011, 3:16.
11
Page 13
hidden
Figures
Figure 1 - An overview of the Bioclipse QSAR and OpenTox integration.
Toxicological properties of molecules can be calculated in Bioclipse using the online computational services
within the OpenTox cloud, in parallel to local services. When the user calculates these properties, Bioclipse
will ¯rst query local and online service providers for available functionality. Example services in the
OpenTox cloud are the ToxTree toxicology prediction models. The OpenTox cloud is queried by Bioclipse
internally using the SPARQL query language. Once the user has selected the toxicological properties of
interest (see Figure 2), these will be calculated by Bioclipse. Here, REST technologies are used to perform
this computation in the OpenTox cloud. The computed results can then be used in Bioclipse.
[width=12cm]graphics/arch.pdf
12
Page 14
hidden
Figure 2 - Integration of OpenTox descriptors in Bioclipse QSAR.
Molecular descriptors are much used in computational toxicology models. This screenshot from Bioclipse
QSAR shows descriptors discovered on the Internet (providers: OpenTox and CDK REST) in combination
with local software (provider: CDK).
[width=15cm]graphics/qsar-ot-desc.png
13
Page 15
hidden
Figure 3 - Graphical user interface for uploading data to OpenTox.
Sharing new toxicological data about molecular structures can be done by uploading the data to an
OpenTox server. This Bioclipse dialog shows a select MDL SD ¯le with ten neurotoxins (neurotoxins.sdf )
being shared on the Ambit2 server, the OpenTox server to upload to, providing a title for the data set, and
a license (see main text). Clicking the Finish button will upload the structures and open a web browser
window in Bioclipse with the resulting online data set (see Figure 4).
[width=8cm]graphics/dlg-upload.png
14
Page 16
hidden
Figure 4 - OpenTox web page showing uploaded data.
Screenshot of Bioclipse showing a web browser window with the neurotoxins data hosted on the Ambit2
OpenTox server after the upload, as shown in Figure 3 (see
http://apps.ideaconsult.net:8080/ambit2/dataset/619517).
[width=12cm]graphics/res-browserfull.png
15
Page 17
hidden
Tables
Table 1 - BSL script commands for interacting with the OpenTox platform.
Command(parameters) Description
Querying
listModels(service) Lists the predictive models available from the given ser-
vice.
getFeatureInfo(ontologyServer, feature) Returns information about a particular molecular fea-
ture (property).
getFeatureInfo(ontologyServer, features) Returns information about a set of molecular features.
getModelInfo(ontologyServer, model) Returns information for a computational model.
getModelInfo(ontologyServer, models) Returns information for a list of computational models.
getAlgorithmInfo(ontologyServer, algorithm) Returns information for a computational algorithm.
getAlgorithmInfo(ontologyServer, algorithms) Returns information for a list of computational algo-
rithms.
listAlgorithms(ontologyServer) Returns a list of algorithms.
listDescriptors(ontologyServer) Returns a list of descriptor algorithms.
listDataSets(service) Returns the data sets available at the given OpenTox
server.
searchDataSets(ontologyServer, query) Returns matching data sets using a free text search.
search(service, inchi) Returns matching structures based on the InChI given.
search(service, molecule) Returns matching structures based on the molecule
given.
Computation
calculateDescriptor(service, descriptor, molecules) Calculates a descriptor value for a set of molecules.
calculateDescriptor(service, descriptor, molecule) Calculates a descriptor value for a single molecule.
predictWithModel(service, model, molecules) Predicts modeled properties for the given list of
molecules.
predictWithModel(service, model, molecule) Predicts modeled properties for the given molecule.
Data exchange
createDataset(service) Creates a new data set on an OpenTox server.
createDataset(service, molecules) Creates a new data set on an OpenTox server and adds
the given molecules.
createDataset(service, molecule) Creates a new data set on an OpenTox server and adds
a single molecule.
addMolecule(dataset, mol) Adds a molecule to an existing data set.
addMolecules(dataset, molecules) Adds a list of molecules to an existing data set.
deleteDataset(dataset) Deletes a data set.
downloadCompoundAsMDLMol¯le(service, dataset,
molecule)
Downloads a molecule from a data set as a MDL mol¯le.
downloadDataSetAsMDLSD¯le(service, dataset, ¯le-
name)
Download a complete data set as MDL SD ¯le and saves
it to a local ¯le in the Bioclipse workspace.
listCompounds(service, dataset) Lists the molecules in a data set.
Authentication
login(accountname, password) Authenticate the user with OpenSSO and login on the
OpenTox network.
logout() Logout from the OpenTox network.
getToken() Returns a security token when Bioclipse is logged in on
the OpenTox network.
16
Page 18
hidden
Additional ¯les
Additional ¯le 1 | The structures of ten neurotoxins.
Filename: neurotoxins.sdf (format: Symyx SD ¯le).
Additional ¯le 2 | Bioclipse Scripting Language script to calculate a molecular descriptor.
Filename: calculateADescriptor.js (format: JavaScript source code ¯le).
Bioclipse Scripting Language script to calculate the ¯rst molecular descriptor it ¯nds on the OpenTox
server Ambit2 for two structures created from the molecular line notation format SMILES. A similar script
is available from MyExperiment at http://www.myexperiment.org/work°ows/1646
17
Page 19
hidden
Figure 1
Page 20
hidden
Figure 2
Page 21
hidden

Page 22
hidden
Figure 4
Page 23
hidden
Additional files provided with this submission:
Additional file 1: neurotoxins.sdf, 66K
http://www.biomedcentral.com/imedia/2142678472587594/supp1.sdf
Additional file 2: calculateADescriptor.js, 0K
http://www.biomedcentral.com/imedia/1577960050587594/supp2.js

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

10 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
70% Ph.D. Student
 
20% Post Doc
 
10% Researcher (at a non-Academic Institution)
by Country
 
30% Germany
 
30% India
 
10% Sweden