A perspective of publicly accessible/open-access chemistry databases.
- PubMed: 18549975
Abstract
The Internet has spawned access to unprecedented levels of information. For chemists the increasing number of resources they can use to access chemistry-related information provides them a valuable path to discovery of information, one which was previously limited to commercial and therefore constrained resources. The diversity of information continues to expand at a dramatic rate and, coupled with an increasing awareness for quality, curation and improved tools for focused searches, chemists are now able to find valuable information within a few seconds using a few keystrokes. This shift to publicly available resources offers great promise to the benefits of science and society yet brings with it increasing concern from commercial entities. This article will discuss the benefits and disruptions associated with an increase in publicly available scientific resources.
Author-supplied keywords
A perspective of publicly accessible/open-access chemistry databases.
Volume 13, Numbers 11/12
June 2008 REVIEWS
li
y
le
y-r
R
e
v
i
e
w
s
I
N
F
O
R
M
A
T
I
C
Spotential for providing useful chemistry information and dataprimary portal to query for general information and data. Yet,
despite the tremendous growth in scientific Internet resources, the
have only recently started to be tapped. Bioinformatics has led
and development processes. Pharmaceutical companies, in par-
ticular, should welcome improved access to chemistry-related
information as their business dominance is taken to task with
drugs coming off patent and no replacement blockbuster drugsanswer would be daily. Certainly the web now dominates as theto discovery of information, one which was previously limited to commercial and therefore constrained
resources. The diversity of information continues to expand at a dramatic rate and, coupled with an
increasing awareness for quality, curation and improved tools for focused searches, chemists are now
able to find valuable information within a few seconds using a few keystrokes. This shift to publicly
available resources offers great promise to the benefits of science and society yet brings with it increasing
concern from commercial entities. This article will discuss the benefits and disruptions associated with
an increase in publicly available scientific resources.
Ask a chemist how often they use the Internet to search for science-
related information online and it would be fair to expect the
only to facilitate improved access to information in academia
and government laboratories but also in commercial organiza-
tions feeling the pressure of poor performance in the discoveryA perspective of pub
open-access chemistr
Antony J. Williams
ChemZoo Inc., 904 Tamaras Circle, Wake Forest, NC 27587, United States
The Internet has spawned access to unprecedented
number of resources they can use to access chemistrthe charge in providing online access to data far ahead of the
efforts in Chemistry. Open-access databases, like GenBank and the
Protein Data Bank, have been assisting biologists to translate gene
and protein sequences into biological relevance for well over two
decades. Some of the responsibility for the differences in efforts has
commonly been put onto the shoulders of publishers in chemistry,
whether for scientific articles or commercial chemistry databases.
Nevertheless, societal thrust, evangelists and group efforts are
forcing both free and open access (vide infra) to chemistry-related
information.
There are many indexes of chemistry databases online and this
article is not intended to be yet another. Rather, this article will
review both the availability and capabilities of online chemistry
databases, particularly those offering free or open access. The
progress in the availability of freely accessible information is
highly enabling and advantageous for the advancement of
science and our future well-being but probably seen as a dis-
ruptive force for commercial bodies but, this shift is needed, not
1359-6446/06/$ - see front matter 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.drudis.2008.03.017cly accessible/
databases
vels of information. For chemists the increasing
elated information provides them a valuable pathin the pipeline.
In keeping with the web-based nature of this article, the major-
ity of references will actually be to Internet resources. At the time
of submission all references were active but, as is the nature of the
Internet, these resources will age and may disappear.
What is open versus free access?
There is much confusion around the differences between open
access (OA) versus free access (FA). This is also accompanied by
corporate protectionism and political battles which have found
their way to Capitol Hill. Both OA and FA offer a great opportunity
to the advancement of science by sharing data, information and
knowledge as it is created. With these in place, publishing houses
and institutional repositories of information, specifically structure
databases with related information, are threatened by the poten-
tial impact on their business model and associated revenues.
The first major international statement on open access was the
Budapest Open Access Initiative (BOAI), in February 2002 (FAQs
www.drugdiscoverytoday.com 495
(1)
(2)
REVIEWS Drug Discovery Today
Volume 13, Numbers 11/12
June 2008
R
e
v
i
e
w
s
I
N
F
O
R
M
A
T
I
C
Sprofit activities.
To most people free access is not inferior to open access – just
different. Most scientists are overjoyed to have free access to
information and data previously not available and will willingly
use such resources.
Political decision makers are now directly engaged in
reviewing the benefits of OA (and even FA) to science and
society and limiting open access. In November of 2007 President
Bush vetoed a bill that aimed to make all National Institutes of
Health (NIH)-funded research publications freely available on
the web and the furor that arose from the appearance of the
PubChem database as competition to other repositories (speci-
fically the Chemical Abstracts Service) has caused a significant
rift between members of the ACS and the management team.
While these challenges are yet to be fully navigated, it is clear
that publishers will need to modify their business models to
address the drive toward more FA and OA resources. Meanwhile,
groups such as the Public Library of Science and Chemistry
Central, discussed recently, are leading the charge for more
open access.
For the purpose of this article, both free access and open access
will be defined as no barriers to using the system in order to derive
value: no forced registration or logins in order to use the system.
This does not necessarily mean that there might not also be fee-abase
and
496They do, however, sell their content to organizations, provide
secure portal access and data export for organizations for a fee
nd have a business model enabling both free access and fors:
The Royal Society of Chemistry offers access to thousands of
articles via its free access policies. Copyright remains with the
society.
The SureChem patent portal (vide infra) provides chemical
tructure-searchable access to patents online, free of charge.online). The definition of open access from the BOAI frequently
asked questions website is as follows: ‘‘By ‘open access’ to this
literature, we mean its free availability on the public Internet,
permitting any user to read, download, copy, distribute, print,
search, or link to the full texts of these articles, crawl them for
indexing, pass them as data to software, or use them for any other
lawful purpose, without financial, legal, or technical barriers other
than those inseparable from gaining access to the internet itself.
The only constraint on reproduction and distribution, and the
only role for copyright in this domain, should be to give authors
control over the integrity of their work and the right to be properly
acknowledged and cited.’’ With this definition in mind, one can
understand the potential impact to publishing business models.
There are other concerns in the OA model but these are reviewed
elsewhere.
Free access is not equivalent to OA, but, by definition, OA
assumes free access as part of its model. This author could
not find an equivalent definition for free access online, but
suggests the following ‘free access is access that removes price
barriers but not necessarily any permission barriers.’ Many
organizations provide free access to their publications, content
and data but the rules under which the information is made
available can differ between the hosts. Two specific examples ared services associated with the resource – a site can have free
fee-based services on offer simultaneously.
www.drugdiscoverytoday.comFree- and open-access online chemistry databases
Depending on the definition of a database, there are many freely
available on the web. It is not uncommon to hear chemists
comment about a downloadable database of structures. This com-
monly refers to a file containing a number of structures, com-
monly tens to tens of thousands in the Structure Data Format
(SDF). In general, these files are then imported to a database for
viewing. There are many hundreds of SDF files available online
and, based on experience, a week of downloading, importing and
de-duplication could easily provide at least 15 million unique
structures. As this article goes to press the PubChem dataset
now numbers 18 million unique structures and can be down-
loaded as a single data source. These SDF files are commonly
provided by chemical vendors for the purpose of facilitating
commercial sales. Such files can contain structure identifiers
(names and numbers), experimental or physical properties, file-
specific identifiers and, commonly, pricing information. Since
they are assembled in a heterogeneous manner, such data are
plagued with inconsistencies and data quality issues.
There are, however, a number of online database resources
offering access to valuable data and knowledge. Some of these
could be thought of as ‘linkbases’, a term which for the sake of this
article can be considered as a repository of molecular connection
tables linking out to multiple sources of data and associated
information. Although this review cannot be exhaustive we will
examine a number of these free resources and the value they are
starting to deliver to chemists.
PubChem
The highest profile, online database is certainly PubChem. NIH
launched the database in 2004 as part of a suite of databases
supporting the New Pathways to Discovery component of their
roadmap initiative. PubChem archives and organizes information
about the biological activities of chemical compounds into a
comprehensive biomedical database to support the Molecular
Libraries initiative component of the roadmap. PubChem is the
informatics backbone for the initiative and is intended to
empower the scientific community to use small molecule chemical
compounds in their research.
PubChem consists of three databases (PubChem Compound,
PubChem Substance and PubChem Bio-Assay) connected together
and incorporated into the Entrez information system of the
National Center for Biotechnology Information (NCBI). PubChem
Compound contains 18 million unique structures and provides
biological property information for each compound through links
to other Entrez databases (see Fig. 1 for an example). PubChem
Substance contains records of substances from depositors into the
system. These are publishers, chemical vendors, commercial data-
bases and other sources. It provides descriptions of chemicals and
links to PubMed, protein 3D structures and screening results. The
PubChem Compound database contains records of individual
compounds. PubChem BioAssay contains information about
bioassays using specific terms pertinent to the bioassay.
PubChem can be searched by alphanumeric text variables, such
as names of chemicals, property ranges or by structure, substruc-
ture or structural similarity. As of December 2007, its content isapproaching 38.7 million substances and 18.4 million unique
structures.
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime



