Sign up & Download
Sign in

Preparing and Filtering Compound Databases for Virtual and Experimental Screening

by Maxwell D Cummings, Éric Arnoult, Christophe Buyck, Gary Tresadern, Ann M Vos, Jörg K Wegner
Screening (2011)

Cite this document (BETA)

Available from eu.wiley.com
Page 1
hidden

Preparing and Filtering Compound Databases for Virtual and Experimental Screening

2
Preparing and Filtering Compound Databases
for Virtual and Experimental Screening
Maxwell D. Cummings, Eric Arnoult, Christophe Buyck, Gary Tresadern,
Ann M. Vos, and J€org K. Wegner
2.1
Introduction
In drug discovery research, the term screening is used to describe a process in which
members of what can be a very large set of molecules are evaluated for a specific
biochemical activity, most commonly enzyme inhibition or receptor antagonism or
agonism.High-throughput screening (HTS) involves the testing of tens of thousands
to a million or more small molecules, seeking new starting points for medicinal
chemistry-based drug discovery. Increasingly, computer-based virtual screening is
used to prioritize the set of molecules that will be tested experimentally, by using one
or more scoring functions to predict the desired biochemical activity. In this context,
virtual screeningmethods are also used to explore virtual combinatorial libraries and
catalogs of purchasable compounds to selectively guide the synthesis or purchase of
additional compounds to enrich screening collections.
The term virtual screening spans a range of techniques; here, we use it to refer
to 3D methods like pharmacophore-based searching, automated docking, and
shape-based matching. These methods screen on the basis of matching a set of
pharmacophoric points, complementary to a specified region of a target protein or
overall shape matching (often with chemical features mapped to the query shape),
respectively. What these computational approaches have in common is that they all
are essentially methods for mining databases of molecules. A common resource for
all these virtual experiments is the collection or database of small molecules to be
mined.
Database preparation is an important aspect of virtual screening. Molecules must
be represented by one or more chemically sensible states and must be formatted
appropriately for the virtual screening tool to be used. Relevant considerations
include storage format(s), 2D–3D conversion or 3D structure generation, stereo-
chemistry, charge, tautomers, protonation, and conformers. The impact of different
database preparation protocols has not been thoroughly evaluated. Limited infor-
mation in this area indicates that database preparation does impact final screening
Virtual Screening. Edited by C. Sotriffer
Copyright  2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
ISBN: 978-3-527-32636-5
j35
Page 2
hidden
outcome (discussed below), but an ultimate best practice, if such is possible, remains
to be detailed.
Database “filtering” most typically refers to methods for “cleaning up” a virtual
compound database, by removing undesirable (e.g., reactive functional groups, high
molecular weight (MW)) compounds; the same approaches can be used to preselect a
database subset enriched in desired properties. Here, we briefly summarize several
aspects of databasemanagement and discuss and expand uponmaterial presented in
our previous survey of this topic [1]. Thenwe briefly review a few recent papers aswell
as some relevant technological developments that were not considered in our
previous survey. Finally, we focus on various types of property- and target-based
filtering, as these approaches are an integral aspect of building, maintaining, and
using a large screening collection.
2.2
Ligand Databases
Ligand preparation and database maintenance can be divided into several subtopics.
Ligands need to be represented as chemical data structures. Some ligands may
requiremultiple structures, with comprehensive representation requiring treatment
of chirality and/or tautomerization and/or protonation state(s). Dependent on the
intended use of the database, each structuremay further require elucidation of one or
more 3D conformers. Each of the resultant representations may then be annotated
with various types of information, for example, conformational energy, MW,
purchase or synthesis source, and amount of physical compound available. This
body of informationmust then be stored as completely and as compactly as possible.
In this section, we explore and comment on some of these aspects of virtual ligand
preparation.
2.2.1
Chemical Data Structures
Chemical file formats provide different levels of accuracy, and this translates to
different levels of reproducibility when structures are built from files. Translation
errors of chemical structures can occur when moving between programs if the two
programs are not communicating at the same level of data accuracy. SMILES [2] is
widely used to store chemical structures. In principle, SMILES can encode the full set
of structural information. In practice, unfortunately, different SMILES readers/
writers exist and this can lead to inconsistencies in chemical structures (Figure 2.1).
The MDL SD [3] and Sybyl mol2 (http://tripos.com/tripos_resources/fileroot/pdfs/
mol2_format2.pdf) formats can include 2D and 3D information, in contrast to
SMILES that contains the topological arrangement of atoms but no atom type,
2D or 3D coordinate information. (Based on SMILES, 2D coordinates can be created
with structure diagram algorithms (e.g., Ref. [4]), and 3D coordinates built with
conformer generation algorithms (discussed below).) The OEChem/OpenBabel
36j 2 Preparing and Filtering Compound Databases for Virtual and Experimental Screening

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

2 Readers on Mendeley
by Discipline
 
by Academic Status
 
100% Researcher (at a non-Academic Institution)
by Country
 
50% Belgium
 
50% United States