Free online resources enabling crowd-sourced drug discovery
- ISSN: 14694344
Abstract
The availability of freely accessible online resources to enable and support drug discovery has blossomed in recent years.The PubChem platform is now accompanied by a myriad of other online databases including ChEBI, DrugBank, the Human Metabolome Database and ChemSpider.The access to the array of software tools and diverse data in public domain provides capabilities previously only available within the confines of organisations (eg, big Pharma) that could afford significant investments in cheminformatics.This paper provides an overview of the internet resources available to drug discovery scientists and discusses the advantages of such accessibility but also the potential risks that reside within the data. It also examines what the present resources continue to lack and sets a vision for future approaches to providing internet-based resources for drug discovery.
Author-supplied keywords
Free online resources enabling crowd-sourced drug discovery
databases searchable by molecular structure
(Figure 1). Chemistry information on the internet
continues to become more widely accessible and at
an increasing rate. There are many freely available
chemical compound databases on the web1,2.
These databases generally contain the chemical
identifiers in the form of chemical names (system-
atic and trade) and registry numbers. Since the files
in the databases are assembled in a heterogeneous
manner, using variations in deposition processes
and procedures to handle chemical structures, the
resulting data are plagued with inconsistencies and
quality issues. There are many databases available
from which the drug discovery community can
derive value. These databases generally have a spe-
cific focus based on the domain expertise of the
hosting organisation; examples include databases
of curated literature data, chemical vendor cata-
logues, patents, analytical data, biological data,
etc. There are too many to include in this single
article so only a small number will be discussed.
For example, the authors recommend a recent arti-
cle that assesses the expanding public and com-
mercial databases containing bioactive com-
pounds3 and conclude that the commercial efforts
are ahead of the public ones.
The availability of molecule databases such as
PubChem (http://pubchem.ncbi.nlm.nih.gov/) has
dramatically changed the landscape of publicly
available cheminformatics resources, yet
PubChem covers only a fraction of the chemical
universe, mostly of interest to chemical genomics
and pharmaceutical research. PubChem was
By Dr Antony J.
Williams,
ValeryTkachenko,
Dr Chris Lipinski,
Professor Alexander
Tropsha and
Dr Sean Ekins
Drug DiscoveryWorldWinter 2009/10 33
Cheminformatics
Free online resources
enabling crowd-sourced
drug discovery
The availability of freely accessible online resources to enable and support drugdiscovery has blossomed in recent years.The PubChem platform is nowaccompanied by a myriad of other online databases including ChEBI, DrugBank,the Human Metabolome Database and ChemSpider.The access to the array ofsoftware tools and diverse data in public domain provides capabilitiespreviously only available within the confines of organisations (eg, big Pharma)that could afford significant investments in cheminformatics.This paperprovides an overview of the internet resources available to drug discoveryscientists and discusses the advantages of such accessibility but also thepotential risks that reside within the data. It also examines what the presentresources continue to lack and sets a vision for future approaches to providinginternet-based resources for drug discovery.
Pathways to Discovery’ component of the
Roadmap for Medical Research4. PubChem
archives and organises information about the bio-
logical activities of chemical compounds into a
comprehensive database and is the informatics
backbone for the Molecular Libraries and Imaging
Initiative, which is part of the NIH Roadmap.
Pubchem is also intended to empower the scientif-
ic community to use small molecule chemical com-
pounds in their research as molecular probes to
investigate important biological processes or gene
functions. The PubChem compound repository
presently contains more than 25 million unique
structures with biological property information
provided for many of the compounds. For now,
PubChem remains focused on its initial intent to
support the Molecular Libraries Initiative and
serves as an extremely valuable and authoritative
resource for cheminformatics and chemical
genomics. However, there are a number of con-
straints around the system, especially in its place
as a repository of data and information without a
special effort toward curating these data.
Naturally, in the absence of data curation any
errors in the data are transferred across many
online databases that depend on PubChem and
ultimately, the errors influence the quality of com-
putational models based on this data.
The Chemical Entities of Biological Interest, or
ChEBI database (http://www.ebi.ac.uk/chebi/) is a
highly curated database of molecular entities
focused on small chemical compounds. The entities
are either natural products or synthetic products
used to intervene in the processes of living organ-
isms. ChEBI includes an ontological classification
(Figure 2), whereby the relationships between
molecular entities or classes of entities and their
‘parents’ and/or ‘children’ are specified. While the
database presently offers access to close to 19,000
entities this is expected to expand to more than
440,000 by the end of October
(http://www.ebi.ac.uk/chebi/newsForward.do#Ch
EMBL%20data%20integration). The database is
available for download by anonymous FTP
(ftp://ftp.ebi.ac.uk/pub/databases/chebi/).
The Human Metabolome Database
(http://www.hmdb.ca)5,6 (HMDB) is a compre-
hensive curated collection of human metabolite
and human metabolism data. It contains records
for more than 6,800 endogenous metabolites. In
addition to its comprehensive literature-derived
data, the HMDB also contains an extensive col-
lection of experimental metabolite concentration
data compiled from hundreds of mass spectra
(MS) and Nuclear Magnetic resonance (NMR)
metabolomic analyses performed on urine, blood
Figure 1A graphical interpretation ofthe history ofchem/bioinformatics software,model and databasedevelopment, and increasingdrug development costs versusregistered compounds in theCAS Registry and theChemSpider database
34 Drug DiscoveryWorldWinter 2009/10
Cheminformatics
Year1
960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010
60
50
40
30
20
10
0
Regis
tered
comp
ound
s(mi
llions
)
References
1Williams,AJ (2008).Aperspective of publiclyaccessible/open-accesschemistry databases. DrugDiscov Today 13 (11-12), 495-501.
2Williams,AJ (2008). Internet-based tools for communicationand collaboration in chemistry.Drug Discov Today 13 (11-12),502-506.
3 Southan, C et al (2009).Quantitative assessment of theexpanding complementaritybetween public andcommercial databases ofbioactive compounds. JCheminformatics 1, 10.
4 Office of Portfolio Analysisand Strategic Initiatives,N.I.o.H (2008).The NIHRoadmap Initiative.
5Wishart,DS et al (2007).HMDB: the Human MetabolomeDatabase.NucleicAcids Res 35(Database issue),D521-526.
6Wishart, DS et al (2009).HMDB: a knowledgebase forthe human metabolome.Nucleic Acids Res 37(Database issue), D603-610.
Continued on page 36
supplemented with thousands of NMR and MS
spectra collected on purified, reference metabo-
lites. Each metabolite entry in the HMDB contains
data fields including a comprehensive compound
description, names and synonyms, structural
information, physicochemical data, reference
NMR and MS spectra, biofluid concentrations,
disease associations, pathway information,
enzyme data, gene sequence data, SNP and muta-
tion data as well as extensive links to images, ref-
erences and other public databases. Recent
improvements have included spectra and substruc-
ture searching.
DrugBank (http://www.drugbank.ca/) is a manual-
ly curated resource7 assembled from a series of
other public domain databases (KEGG, PubChem,
ChEBI, PDB, Swiss-Prot and GenBank) and
enhanced with additional data generated within
the laboratories of the hosts. The database aggre-
gates both bioinformatics and cheminformatics
data and combines detailed drug data with com-
prehensive drug target (ie protein) information.
The database contains FDA approved small mole-
cule and biotech drugs as well as experimental
drugs, representing nearly 5,000 molecules8. The
database supports extensive text, sequence, chemi-
cal structure and relational query searches of the
nearly 100 data fields. The data from DrugBank
has been used to show that the drug to drug-target
relationship is scale-free and several classes of pro-
teins are selectively enriched as drug targets for
FDA approved drugs9.
ZINC (http://zinc.docking.org/index.shtml) is a
free, searchable database of commercially avail-
able compounds for virtual screening10,11. The
library contains more than 20 million molecules,
each with a 3D structure and gathered from the
catalogues of compounds from vendors. All mole-
cules in the databases are assigned biologically-rel-
evant protonation states and annotated with
molecular properties.
ChemSpider (http://www.chemspider.com/)1,2 is a
community resource for chemists provided by the
Royal Society of Chemistry (Figure 3). It offers a
number of facilities that distinguishes the service
from many of the other databases listed in this arti-
cle. At the time of writing it contains more than 23
million unique chemical entities aggregated from
more than 200 diverse data sources, including gov-
ernment databases, chemical vendors, commercial
database vendors, publishers, all of the databases
listed above and from individual chemists.
ChemSpider has also integrated the SureChem
patent database collection (http://www.surechem.
org/) of structures to facilitate structure-based link-
ing to patents between the two data collections.
ChemSpider can be queried using struc-
ture/substructure searching and alphanumeric text
searching of both intrinsic as well as predicted
molecular properties. Unique capabilities relative
to other public chemistry databases include real
time curation of the data, association of analytical
data with chemical structures, real-time deposition
of single or batch chemical structures (including
with activity data) and transaction-based predic-
tions of physicochemical data. A series of web serv-
ices are provided to allow integration to the system
for the purpose of searching and linking with other
online databases from other groups (academia or
industry). The integration can be with free or com-
mercial resources. For example, Collaborative
Drug Discovery, Inc (http://www.collaborative
drug.com) recently provided links to ChemSpider
for molecules in its CDD database12 thereby pro-
viding an integration path between a commercial
resource and a public domain database. CDD is a
highly secure, commercial collaborative drug dis-
covery informatics platform and a new type of col-
laborative system that handles a broad array of
Figure 2The ChEBI database offers adetailed ontology includingsubdivision into (1) MolecularStructure, in which molecularentities or parts thereof areclassified according tocomposition and structure (2)Role, which classifies entitieseither on the basis of theirrole within a biologicalcontext, eg antibiotic, antiviralagent, coenzyme, hormone, oron the basis of their intendeduse by humans, eg pesticide,antirheumatic drug, fuel.Thestructure shown is forchloroquine, identified as anantimalarial quinoline alkaloidin the ChEBI ontology
Drug DiscoveryWorldWinter 2009/10 35
Cheminformatics
ly shared among colleagues or openly shared in
standardised formats, at each research group’s dis-
cretion. A focus of CDD is facilitating the growth
of global collaborative research networks for neg-
lected diseases such as malaria, African sleeping
sickness, Chagas disease and tuberculosis.
Subsequently there are currently 50 datasets avail-
able to the public upon registration which can be
readily substructure or similarity searched.
The importance of chemical data
curation in QSAR modelling
Molecular modellers and cheminformaticians alike
typically analyse data generated by other
researchers providing, in general, experimental
data. Consequently, when it comes to the quality of
these data modellers are always at the mercy of the
providers. Practically any modelling cheminfor-
matics study entails the calculation of chemical
descriptors that are expected to accurately reflect
the intricate details of the underlying chemical
structures. Obviously, any error in the structure
translates into either an inability to calculate the
descriptors for erroneous chemical records or into
erroneous descriptors. Naturally, the models built
with this data are either restricted to only a frac-
tion of the formally available data or, worse, they
are merely inaccurate. As both data and models of
the data, as well as the body of scholarly publica-
tions in cheminformatics, continue to grow, it
becomes increasingly important to address the
issue of data quality that inherently effects the
quality of models.
How significant is the problem of accurate struc-
ture representation as it concerns the adequacy and
accuracy of cheminformatics models? A few recent
reports indicate that this problem should be given
serious attention. For instance, benchmarking stud-
ies by a large group of collaborators from six labo-
ratories13,14 have clearly demonstrated that the
type of chemical descriptors has much greater influ-
ence on the prediction performances of QSARmod-
els than the nature of the model optimisation tech-
niques. Furthermore, in another recent seminal
publication15, the authors clearly pointed out the
importance of chemical data curation in the context
of QSAR modelling (eg incorrect structures gener-
ated from either correct or incorrect SMILEs).
Their main conclusions were that small structural
errors within a dataset could lead to significant
losses in the predictive abilities of QSAR models. At
the same time they further demonstrated that man-
ual curation of the structural data leads to a sub-
stantial increase in the model predictivity15.
In their report highlighting the importance of
gathering accurate information to build the WOM-
BAT and WOMBAT-database, Oprea et al16 dis-
cussed the error rate in medicinal chemistry publica-
tions. They found an average of approximately two
errors per publication in the almost 6,800 papers
indexed in the WOMBAT database. With a median
of 25 compounds per series in a publication this
implied an overall error rate of 8% with errors
including17: incorrectly drawn or written structures,
unspecified position of attachment of substituents,
structures with the incorrect backbone, incorrect
generic names or chemical names or duplicates.
The basic steps to curate a dataset of compounds
have been either considered trivial or ignored by
the experts in the field. For instance, several years
ago a group of experts in QSAR modelling devel-
oped what is now known as OECD QSAR model-
ing and validation principles18,19 that the
Figure 3ChemSpider provides links toWikipedia articles, links out tothe original data sources andcommercial suppliers, links outto patents and articles onPubMed. Flexible searchcapabilities are available,together with visualisationtools such as a real time 3Doptimisation engine and displaymodule
36 Drug DiscoveryWorldWinter 2009/10
Cheminformatics
Continued from page 34
7Wishart, DS et al (2006).DrugBank: a comprehensiveresource for in silico drugdiscovery and exploration.Nucleic Acids Res 34(Database issue), D668-672.
8Wishart, DS et al (2008).DrugBank: a knowledgebasefor drugs, drug actions anddrug targets. Nucleic Acids Res36 (Database issue), D901-906.
9 Ma’ayan,A et al (2007).Network analysis of FDAapproved drugs and their targets.Mt Sinai J Med 74 (1), 27-32.
10 Irwin, JJ and Shoichet, BK(2005). ZINC – a free databaseof commercially availablecompounds for virtualscreening. J Chem Inf Model45 (1), 177-182.
11 Irwin, JJ et al (2005).Virtualscreening againstmetalloenzymes for inhibitorsand substrates. Biochemistry44 (37), 12316-12328.
12 Hohman, M et al (2009).Novel web-based toolscombining chemistryinformatics, biology and socialnetworks for drug discovery.Drug Disc Today 14, 261-270.
13Tetko, IV et al (2008).Critical assessment of QSARmodels of environmentaltoxicity against Tetrahymenapyriformis: focusing onapplicability domain andoverfitting by variableselection. J Chem Inf Model 48(9), 1733-1746.
Continued on page 37
acceptance of QSAR models. The need to curate
the primary data from which the models are
derived was not mentioned. The Journal of
Chemical Information and Modeling published a
special editorial highlighting the requirements for
QSAR papers that should be followed by authors
considering publishing their results in the journal20
and recent publications addressing common mis-
takes and criticising faulty practices in the QSAR
modelling field21-23 have appeared, yet none of
these sources have explicitly described and dis-
cussed the importance of chemical record curation
for developing robust QSAR models.
There is an obvious trend within the community
of QSAR modellers to develop and follow the stan-
dardised guidelines for developing statistically
robust and externally predictive QSAR models24.
The importance of developing best practices for
data preparation prior to initiating the modelling
process is obvious. There is therefore a pressing
need to amend the five OECD principles by adding
a sixth rule that would request careful data prepa-
ration prior to model development. There is a need
to develop and systematically employ standard
chemical record curation protocols that should be
helpful in the pre-processing of any chemical
dataset and these could be automated using existing
software packages (many of which are free for aca-
demic investigators). The essential procedures
include the removal of inorganic compounds, coun-
terions and mixtures (because for the most part the
current chemical descriptors do not account for
such molecular records), ring aromatisation, nor-
malisation of specific chemotypes, curation of tau-
tomeric forms and the deletion of duplicates.
Data analytical studies are impossible without
trusting the original data sources. It is important,
whenever possible, to verify the accuracy of the
primary data before developing any model. We
believe that this approach could be summarised by
a famous proverb ‘Trust, but verify’ that was fre-
quently used by the late president Ronald Reagan
during the cold war era and that traces back to the
founder of the Russian KGB Felix Dzerzhinsky
who invented it almost 100 years ago
(http://en.wikipedia.org/wiki/Trust_but_Verify).
Our hope is that other experts will also contribute
their expertise and best practices to this effort.
Improving the quality of putative hits
and leads
Hits or leads in rare, orphan and neglected diseases
(or for that matter many pharmaceutically relevant
targets) can arise from phenotypic or mechanistic
screening against commercially available screening
libraries. Often the screening efforts arise in an
academic setting. Because of the disconnect
between academic biology and expert medicinal
chemistry it is essential to carry out a medicinal
chemistry annotation of putative hits or leads
before expenditure of significant drug discovery
effort. The early stages of the annotation process
can be done using known filters and guidelines for
acceptable chemistry functionality. A more detailed
analysis asking questions about the chemistry of
the hit or lead, and what is known biologically and
chemically about substructures and similar com-
pounds to the hit or lead currently requires a
medicinal chemistry expert and takes on average
about 20 minutes per compound. The in-depth
data available through CAS SciFinder was used in
the annotation of 64 putative tools and probes
from the NIH Roadmap MLSCN effort25.
Progress towards public sector tools for chemistry
annotation might allow for a more affordable and
accessible process in the future. For example, many
companies have instituted filters (usually SMARTS
queries) to remove undesirable molecules, false
positives and frequent hitters from their HTS
screening libraries or to filter vendor compounds.
Early examples include REOS from Vertex26,
basic, hard and soft filters from GSK27 and func-
tional group compound filters from BMS28. These
are in addition to the many proprietary filters at
companies. A particular issue is chemical reactivity
towards protein thiol groups. A group from
Abbott reported a sensitive assay to detect reactive
molecules by NMR (ALARM NMR)29,30. A fol-
low up study used 8,800 compounds with data
from this assay to create a Bayesian classifier
model with extended connectivity fingerprints
(ECFP_6) with good classification accuracy to pre-
dict reactivity31. This also identified 175 substruc-
tures that were likely of interest as potentially caus-
ing reactivity. Currently there is no freely accessible
automated method for filtering compounds or
alerting users to reactivity issues. If we were to take
this further, how could we encode the knowledge
of many medicinal chemists with drug discovery
expertise into a piece of software or database that
would identify chemical ‘trash’ or undesirable mol-
ecules for biologists? There is certainly some scope
here to influence the quality of hits and leads that
are published and annotate such molecules in pub-
lic databases.
Discussion
Freely available databases and tools supporting
drug discovery and chemistry in particular are
Continued from page 36
14 Zhu, H et al (2008).Combinatorial QSAR modelingof chemical toxicants testedagainst Tetrahymena pyriformis.J Chem Inf Model 48 (4), 766-784.
15Young, D et al (2008).Arethe chemical structures inyour QSAR correct? QSARComb Sci 27, 1337-1345.
16 Oprea,TI et al (2007).WOMBAT andWOMBAT-PK:Bioactivity Databases for Leadand Drug Discovery, ChemicalBiology: From Small Moleculesto Systems Biology and DrugDesign. Schreiber, SL, KapoorTM andWess, G (Eds),Wiley-VCH, NewYork, 2007, pp. 760-786.
17 Oprea,TI et al (2003). Onthe propagation of errors inthe QSAR literature inEuroQSAR 2002 – Designingdrugs and crop protectants:Processes, problems andsolutions. Eds Ford, M,Livingstone, D, Dearden, J andVan deWaterbeemd H (Eds),NewYork, Blackwell Publishing,2003, 314-315.
18 Dearden, JC et al (2009).How not to develop aquantitative structure-activityor structure-propertyrelationship (QSAR/QSPR).SAR QSAR Environ Res 20 (3-4), 241-266.
19 Group, QE (2004).Thereport from the expert groupon (Quantitative) Structure-Activity Relationships[(Q)SARs] on the principlesfor the validation of (Q)SARs.OECD Series on Testing andAssessment No. 49.ENV/JM/MONO(2004)24.Organization for EconomicCooperation andDevelopment, Paris, France.206 pp.
20 Jorgensen,WL (2006).QSAR/QSPR and proprietarydata. J Chem Inf Model 46,937.
21 Maggiora, GM (2006). Onoutliers and activity cliffs –why QSAR often disappoints. JChem Inf Model 46 (4), 1535.
Continued on page 38
Drug DiscoveryWorldWinter 2009/10 37
Cheminformatics
seeing more discussion about the need for more
pre-competitive32-35, competitive36 and collabora-
tive approaches12,32 in drug discovery and the
pharmaceutical industry in general, covering areas
such as informatics, ADME/tox and clinical. This
raises the question: “What could we achieve by just
making more software and data resources avail-
able on the web?” There is currently little in the
way of freely available resources for computation-
al ADME/Tox (apart from efforts like the ToxCast
project37,38 at the EPA where several hundred
compounds have been screened in more than 600
biological assays and the results have been made
public, representing a resource for future models)
so when will this change? Perhaps, as more data is
placed in the public domain by companies that are
holding on to it closely. If more computational
tools and biological data were freely available it
would facilitate crowd-sourced drug discovery and
basically level the playing field for small (or one-
person) virtual companies versus other pharma
and biotech without requiring expensive tools and
databases (eg CAS SciFinder). In this case, anyone
with access to a computer anywhere in the world
can contribute to drug discovery regardless of
whether they belong to a company, research insti-
tute or not. Young gamers are already contributing
to the optimised folding of proteins as evidenced
by the success in the Community-Wide Experiment
on the Critical Assessment of Techniques for
Protein Structure Prediction, or CASP.
(http://www.wired.com/medtech/genetics/magazin
e/17-05/ff_protein). Such efforts represent truly
distributed discovery and could contribute to fully
integrated pharmaceutical networks. When this
occurs there will be more of a need to work with
highly dispersed individual researchers, store their
data and possibly take molecules to the next step,
eg enabling preclinical testing, animal studies etc.
This will then require companies such as
AssayDepot (http://www.assaydepot.com/) and
CDD to help generate and store data needed for
progressing molecules to clinical studies and find-
ing larger companies or organisations to take these
further. We are seeing a shift from requiring pow-
erful computers within insular organisations to do
drug discovery to using resources on the web, and
so this opens up being able to use cheap portable
and mobile devices to search databases and gener-
ate predictions from computational models. Of
course the quality of the output will be highly
dependent on the initial data quality.
Surprisingly, the investigations into how primary
data quality influences the quality of published
cheminformatics models have been almost absent
in the published literature. It appears that chemin-
formaticians and molecular modellers tend to take
published chemical and biological data at their face
value and launch calculations without carefully
examining the accuracy of data records. However,
there should be much less disagreement concerning
the exact chemical structure of compounds in the
databases except for arguably difficult issues such
as tautomers. Thus, the accuracy of the chemical
structure representation could be addressed direct-
ly in most cases.
Both common sense and the recent QSAR investi-
gations described above indicate that chemical
record curation should be viewed as a separate and
perhaps critical component of cheminformatics
research. By comparison, the community of protein
x-ray crystallographers has long recognised the
importance of structural data curation; indeed the
Protein Data Bank (PDB) team includes a large
group of structure curators whose only job is to
process and validate primary data submitted to the
PDB by experimental crystallographers39.
Furthermore, the NIH recently awarded a signifi-
cant Center grant to a group of scientists from the
University of Michigan (http://www.genome
web.com/informatics/nigms-allots-5m-new-data-
base-house-protein-ligand-data-pharma-contribute)
to curate primary data on protein-ligand complexes
deposited to the PDB. Conversely, the largest pub-
licly funded cheminformatics project, ie, PubChem,
is considered a data repository and no special effort
is dedicated to the curation of structural informa-
tion deposited to PubChem by the various contribu-
tors. Chemical data curation has been addressed
whenever possible by the privately funded, but pub-
licly available, ChemSpider project as well as by sev-
eral other projects reviewed above. It is critical that
scientists who exploit and build models of datasets
derived from current databases or extracted from
publications dedicate their own effort to the task of
data curation.
Of course the hope of using cheminformatics
and databases in drug discovery is to increase the
efficiency and quality of molecules that progress
to later stages. Just identifying reactive molecules
and false positives could be of great utility to the
many groups that are not aware of this problem
and avoid dead ends. If we really are to empower
the user and do crowd-sourced drug discovery, we
will create issues with IP and the ownership of the
collaborative discovery. This consideration could
be one of the reasons why this approach has not
been followed before. Additionally, if we are to
identify gaps in the free tools to crowd-sourced
Continued from page 37
22 Zvinavashe, E et al (2008).Promises and pitfalls ofquantitative structure-activityrelationship approaches forpredicting metabolism andtoxicity. Chem Res Toxicol 21(12), 2229-2236.
23 Johnson, SR (2008).Thetrouble with QSAR (or how Ilearned to stop worrying andembrace fallacy). J Chem InfModel 48 (1), 25-26.
24Tropsha,A and Golbraikh,A(2007). Predictive QSARmodeling workflow, modelapplicability domains, andvirtual screening. Curr PharmDes 13 (34), 3494-3504.
25 Oprea,TI et al (2009).Acrowdsourcing evaluation ofthe NIH chemical probes. NatChem Biol 5 (7), 441-447.
26Walters,WP and Murcko,MA (2002). Prediction of ‘drug-likeness’.Adv Drug Del Rev54, 255-271.
27 Hann, M et al (1999).Strategic pooling ofcompounds for high-throughput screening. J ChemInf Comput Sci 39 (5), 897-902.
28 Pearce, BC et al (2006).Anempirical process for thedesign of high-throughputscreening deck filters. J ChemInf Model 46 (3), 1060-1068.
29 Huth, JR et al (2005).ALARM NMR: a rapid androbust experimental methodto detect reactive falsepositives in biochemicalscreens. J Am Chem Soc 127(1), 217-224.
30 Huth, JR et al (2007).Toxicological evaluation ofthiol-reactive compoundsidentified using a la assay todetect reactive molecules bynuclear magnetic resonance.Chem Res Toxicol 20 (12),1752-1759.
31 Metz, JT et al (2007).Enhancement of chemical rulesfor predicting compoundreactivity towards proteinthiol groups. J Comput AidedMol Des 21 (1-3), 139-144.
32 Louise-May, S et al (2009).Towards integrated web-basedtools in drug discovery.TouchBriefings – Drug Discovery inPress.
Continued on page 39
38 Drug DiscoveryWorldWinter 2009/10
Cheminformatics
having the molecules in a database but not physi-
cally having free access to them for testing. So, the
next big step will be how to make the physical
molecules more widely available to all, by making
them on demand or a centralised storage facility
funded by the NIH etc, a topic which is outside
the scope of this article but worth considering.
Conflicts of interest statement
Sean Ekins consults for Collaborative Drug
Discovery Inc and is on the advisory board for
AssayDepot. Antony J. Williams and Valery
Tkachenko are employed by the Royal Society of
Chemistry which owns ChemSpider and associated
technologies. Alexander Tropsha and Chris
Lipinski have no conflicts of interest. DDW
Dr Antony Williams is Vice-President, Strategic
development, for ChemSpider at the Royal Society
of Chemistry. He has authored more than 100 peer
reviewed papers and book chapters on NMR, pre-
dictive ADME methods, internet-based tools,
crowd-sourcing and database curation. He is an
active blogger and participant in the internet chem-
istry network.
Valery Tkachenko is Chief Technical Officer for
ChemSpider at the Royal Society of Chemistry. He
was intimately involved with the development of
the PubChem platform during his time with NIH
and has been involved with the development of
enterprise level web-based software applications
for the Life Sciences for well over a decade.
Dr Christopher Lipinski is a Scientific Advisor to
Melior Discovery. An ACS, AAPS and SBS mem-
ber, he is author of the ‘rule of five’, a member of
the ACS ‘Medicinal Chemistry Hall of Fame’ and
winner of multiple awards. An adjunct professor at
UMass Amherst, he has 235 publications and invit-
ed presentations and 17 issued US patents.
Professor Alexander Tropsha is K.H. Lee
Distinguished Professor and Chair of the Division
of Medicinal Chemistry and Natural Products in
the Eshelman School of Pharmacy, UNC-Chapel
Hill. His research interests are in the areas of
Computer-Assisted Drug Design, Computational
Toxicology, Cheminformatics, and Structural
Bioinformatics.
Dr Sean Ekins is a Computational Chemist and
has authored more than 130 peer reviewed papers
and book chapters as well as edited three books on
computational applications in pharmaceutical
R&D and computational toxicology. His areas of
interest are in vitro and computational
ADME/Tox, systems biology, cheminformatics and
computer-aided drug discovery.
Continued from page 38
33 Ekins, S andWilliams,AJ(2009). PrecompetitivePreclinical ADME/Tox Data: SetIt Free on TheWeb toFacilitate ComputationalModel Building to Assist DrugDevelopment. Lab on a Chip inPress.
34 Hunter,AJ (2008).TheInnovative Medicines Initiative:a pre-competitive initiative toenhance the biomedicalscience base of Europe toexpedite the development ofnew medicines for patients.Drug Discov Today 13 (9-10),371-373.
35 Barnes, MR et al (2009).Lowering industry firewalls:pre-competitive informaticsinitiatives in drug discovery.Nat Rev Drug Discov 8 (9),701-708.
36 Bingham,A and Ekins, S(2009). CompetitiveCollaboration in thePharmaceutical andBiotechnology Industry. DrugDisc Today Submitted.
37 Judson, R et al (2009).Thetoxicity data landscape forenvironmental chemicals.Environ Health Perspect 117(5), 685-695.
38 Dix, DJ et al (2007).TheToxCast program forprioritizing toxicity testing ofenvironmental chemicals.Toxicol Sci 95 (1), 5-12.
39 Dutta, S et al (2008). Datadeposition and annotation atthe worldwide protein databank. Methods Mol Biol 426,81-101.
Drug DiscoveryWorldWinter 2009/10 39
Cheminformatics
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime




