The Chemical Translation Service—a web-based tool to improve standardization of metabolomic reports
- DOI: 10.1093/bioinformatics/btq476
- PubMed: 20829444
Abstract
Summary: Metabolomic publications and databases use different database identifiers or even trivial names which disable queries across databases or between studies. The best way to annotate metabolites is by chemical structures, encoded by the International Chemical Identifier code (InChI) or InChIKey. We have implemented a web-based Chemical Translation Service that performs batch conversions of the most common compound identifiers, including CAS, CHEBI, compound formulas, Human Metabolome Database HMDB, InChI, InChIKey, IUPAC name, KEGG, LipidMaps, PubChem CID+SID, SMILES and chemical synonym names. Batch conversion downloads of 1410 CIDs are performed in 2.5 min. Structures are automatically displayed. Implementation: The software was implemented in Groovy and JAVA, the web frontend was implemented in GRAILS and the database used was PostgreSQL. Availability: The source code and an online web interface are freely available. Chemical Translation Service (CTS): http://cts.fiehnlab.ucdavis.edu Contact: ofiehnucdavis.edu
The Chemical Translation Service—a web-based tool to improve standardization of metabolomic reports
BIOINFORMATICS APPLICATIONS NOTE Vol. 26 no. 20 2010, pages 2647–2648doi:10.1093/bioinformatics/btq476
Databases and ontologies Advance Access publication September 9, 2010
The Chemical Translation Service—a web-based tool to improve
standardization of metabolomic reports
Gert Wohlgemuth1, Pradeep Kumar Haldiya1, Egon Willighagen2, Tobias Kind1
and Oliver Fiehn1,∗
1University of California Davis, CA, Genome Center, USA and 2Department of Pharmaceutical Science, Uppsala
University, Sweden
Associate Editor: Jonathan Wren
ABSTRACT
Summary: Metabolomic publications and databases use different
database identifiers or even trivial names which disable queries
across databases or between studies. The best way to annotate
metabolites is by chemical structures, encoded by the International
Chemical Identifier code (InChI) or InChIKey. We have implemented
a web-based Chemical Translation Service that performs batch
conversions of the most common compound identifiers, including
CAS, CHEBI, compound formulas, Human Metabolome Database
HMDB, InChI, InChIKey, IUPAC name, KEGG, LipidMaps, PubChem
CID+SID, SMILES and chemical synonym names. Batch conversion
downloads of 1410 CIDs are performed in 2.5 min. Structures are
automatically displayed.
Implementation: The software was implemented in Groovy and
JAVA, the web frontend was implemented in GRAILS and the
database used was PostgreSQL.
Availability: The source code and an online web interface are freely
available. Chemical Translation Service (CTS): http://cts.fiehnlab.
ucdavis.edu
Contact: ofiehn@ucdavis.edu
Received on May 11, 2010; revised on July 22, 2010; accepted on
August 14, 2010
1 INTRODUCTION
The Metabolomics Standards Initiative (MSI) proposed the use
of database identifiers for publishing reports in peer-reviewed
journals or in data repositories (Sumner et al., 2007), but the MSI
did not specify best practice standards which identifier to use.
Consequently, metabolomic data are presented by a wide variety
of identifiers, mostly using publicly available databases such as
KEGG, HMDB or PubChem. In other cases, authors merely use
compound names without referencing to databases. Compound
names are very poor descriptors (Kind et al., 2009) as names often
cannot be unambiguously mapped to authentic chemical structures,
either because of missing chiral information (D, L) or because each
chemical structure is associated with many synonym names, some of
which may also be used for other structures. In addition, no database
contains all identifiers of all other repositories. For example,
KEGG LIGAND is a popular biochemical pathways database, but
it is incomplete for many compounds found in human organs as
∗To whom correspondence should be addressed.
given in the Human Metabolome database HMDB. Although each
database lists outlinks to other databases, no single database provides
comprehensive mapping options to other databases, and rarely there
are batch query options offered.
Analytical chemists and biochemists may not be used to standard
structure codes or lack expertise for downloading databases or
installing software. We here present a publicly available tool that
enables researchers to quickly convert lists of compound database
identifiers, including the important InChI Keys.
Software: The Chemical Translation Service (CTS) was
implemented using the programming languages Groovy (v1.7) and
Java (v6.20). The open source web application framework Grails
1.2.2 and freely available plugins were used for the development of
web services. For data storage, the PostgreSQL database, an open-
source object-relational database management system was used.
Easy access from other languages and platforms is provided via
SOAP (Simple Object Access Protocol) web services.
Hardware: The used hardware was a dual quad-core Intel X5450
Xeon based server with 16GB of RAM and an SSD (Solid State
Disk) storage array with three disks in a Raid-0 configuration to
store the Lucene index files. The average disk throughput was
600MB/s. An additional SAS (Serial Attached SCSI) storage array
with 16 disks in Raid-6 configuration was used to store the database
content.
Source Code:The source code is hosted as a Google Code Project
at http://code.google.com/p/chemical-compound-repository/
Web Front End: The database is freely available under:
http://cts.fiehnlab.ucdavis.edu
2 RESULTS
The CTS consists of three major services.
(1) The Discovery Service detects chemicals in provided text
and returns a list of chemicals as CSV, TXT, XML or PDF.
(2) The Convert Service interconverts any chemical identifier
into other chemical identifiers.
(3) The Batch Convert Service converts multiple identifiers of
the same type into multiple identifiers.
We recommend that users, especially chemists and biologists, use
CTS for standardizing metabolomic reports into MSI-compliant
formats before publishing data. Single identifiers can be converted
such as KEGG identifier to PubChem ID, or SMILES to KEGG ID.
Importantly, batch convert services are supported (Table 1),
© The Author(s) 2010. Published by Oxford University Press.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/
by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime




