Sign up & Download
Sign in

Development of new molecular target drug

by Kenji Tamura
Gan to kagaku ryoho Cancer chemotherapy (2010)

Cite this document (BETA)

Available from www.ncbi.nlm.nih.gov
Page 1
hidden

Development of new molecular target drug

10
PROTEIN X-RAY
CRYSTALLOGRAPHY IN
DRUG DISCOVERY
Peter Nollert, Michael D. Feese, Bart L. Staker, and
Hidong Kim
deCODE biostructures
Bainbride Island, Washington
10.1 INTRODUCTION 374
10.2 CRYSTALLIZATION 376
Background 377
Formats 380
Membrane Proteins 381
Crystallization Factors 382
Observation and Documentation 384
Crystallization Formulation Screens 384
Cryopreservation for Cryocrystallography 385
Crystal-Based Drug Discovery 386
Derivatization for Anomalous Diffraction Experiments 388
Crystallization Data Sources 388
Protein Crystallization Demonstration Experiment 389
10.3 X-RAY DIFFRACTION EXPERIMENT 389
Diffraction Theory 389
X-ray Diffraction by Macromolecule Crystals 390
Fourier Synthesis in X-ray Crystallography 392
Bragg’s Law and the Angular Dependence of X-ray Diffraction 396
Electron Clouds and Thermal Motion 398
X-ray Diffraction Data Collection in Practice 399
Home Laboratory Diffraction Data Collection 402
Synchrotron Diffraction Data Collection 403
10.4 X-RAY CRYSTAL STRUCTURE DETERMINATION 405
Phase Problem 405
373
Drug Discovery Handbook, by Shayne Cox Gad
Copyright © 2005 by John Wiley & Sons, Inc.
ch10.qxd 5/6/05 03:07 PM Page 373
Page 2
hidden
Structure Factor 405
Heavy Atom Replacement Methods 406
Multiple Isomorphous Replacement 407
Anomalous Dispersion Methods 412
Molecular Replacement 416
10.5 GENERATION AND ANALYSIS OF STRUCTURAL MODELS 420
Aspects of Crystallographic Models of Macromolecules 420
Building the Initial Model 422
Refinement and Analysis of Structural Models 425
Analysis and Preparation of Structural Models 432
Crystallography–Drug Discovery Interface 433
10.6 EXAMPLES FOR THE USE OF X-RAY CRYSTALLOGRAPHY
IN DRUG DISCOVERY 434
Lead Optimization—Structure-Based Drug Design 436
Antistructures 441
Protein Therapeutics 442
In silico Screening Based on Crystallographic Structural Models 442
Crystallographic Screening 443
Crystallographic Fragment Screening 444
Site-Directed Leads via Fragment Tethering 445
Structural Genomics 446
10.7 LIMITATIONS AND CHALLENGES OF X-RAY
CRYSTALLOGRAPHY IN THE DRUG DISCOVERY PROCESS 447
References 450
10.1 INTRODUCTION
The fundamental goal of applying protein X-ray crystallography to drug dis-
covery is to increase its speed, quality, and its rate of success while reducing
the cost. In conjunction with molecular biology, protein X-ray crystallography
forms a seamless interface between target and lead discovery. Nucleic acid
sequences from genetic studies provide the entry point for protein expression,
the first step in crystallographic projects. The product of the crystallographic
endeavor, the structure, is the accurate description of protein and ligand atom
positions in three-dimensional space. Starting in the 1980s, protein X-ray crys-
tallography has impacted the drug discovery process, primarily in the stage of
lead optimization. Recently, however, by means of innovation and integration,
protein crystallography has been transformed into an enabling technology
now covering several stages of the drug discovery process (Fig. 10.1). Besides
lead optimization, protein X-ray crystallography has affected drug discovery
in (a) the identification of new drug targets, (b) the understanding of molec-
ular target mechanisms, and (c) the discovery of new lead compounds.
The integration of protein X-ray crystallography into the early discovery
process by joining it with chemical synthesis and assaying has become a very
374 PROTEIN X-RAY CRYSTALLOGRAPHY IN DRUG DISCOVERY
1
ch10.qxd 5/6/05 03:07 PM Page 374
Page 3
hidden
effective tool for lead optimization. Chemical synthesis can be iteratively
directed toward more promising compounds and away from undesirable com-
pounds based on insight into the interaction of ligands with the target protein.
Thus, structure-based lead optimization has positively affected potency, selec-
tivity profile, or pharmacokinetic properties of drug candidates. The applica-
tion of protein X-ray crystallography impacts lead discovery by methods such
as (a) X-ray crystallographic screening, (b) fragment screening, and (c) site-
directed fragment tethering. Furthermore, structural information is being
used to design and test target structure-based compound libraries (e.g., com-
pound libraries specific for kinases) and to screen ligand compounds in silico
with the goal of enriching useful compounds in virtual libraries prior to high-
throughput assay-based screening.
X-ray crystallographic screening [1] combines the steps of lead identifica-
tion, structural assessment, and subsequent lead optimization by structure
determination of crystals that are soaked in compound cocktails. Ligands with
highest affinities are selected by the crystal, and they are identified crystallo-
graphically. The utility of this strategy is exemplified in a section below.
Fragment screening is a variation of this theme and serves as a tool for
assembling new lead compounds from small druglike fragments. Initially,
libraries of leadlike fragments are formulated and co-crystallized or soaked in
crystals. The resulting structures serve as a basis for combining novel chemi-
cal leads. The advantage over conventional assay-based screening is that very
low affinity fragments with novel structures can be found.
Site-directed lead discovery by fragment tethering [2, 3] adds an additional
layer of complexity to fragment screening, allowing the discovery of low-
INTRODUCTION 375
Structural
Proteomics
Crystallographic
Screening
Crystallographic
Fragment
Screening
Site-directed leads
via
Fragment Tethering in silico Screening
Based on Crystallographic
Structural Models
Design of
Structure-Biased
Compound
Libraries
Structure-Based
Drug Design
Clinical
Trials
Target Identification
Lead Optimization
Hit Generation
Pre-
Trials
Clinical
Figure 10.1 Impact of protein X-ray crystallography on and use of structural infor-
mation for drug discovery purposes.
ch10.qxd 5/6/05 03:07 PM Page 375
Page 4
hidden
molecular-weight ligands. The strategy is based on the covalent modification
of a target protein at a particular site on its surface via thiol-chemistry and the
mass-spectrometric detection of weakly binding ligand precursors. Crystal-
lography provides the tool to observe the binding mode and to direct the
chemical synthesis of fused analogs, the latter having potentiated affinity.
Structure-based drug design consists of iterative processes aimed at the opti-
mization of existing leads. The process includes the choice and the determi-
nation of a target structure, choosing a method for lead discovery and,
crucially, the evaluation of drug leads [4]. A schematic description of the iter-
ative process involved in structure-based drug design is shown in a later
section (Fig. 10.22) [4]. The success of structure-based lead optimization is
measured by improvements of affinity, specificity, or ADME properties.
Structure-based compound libraries can be obtained by so-called in silico
screening. Crystallographic models are used to enrich compound libraries
prior to conventional high-throughput screening [5]. Ligand structures are
“docked” in silico into the binding pockets of protein structures, and the result-
ing modeled complexes often identify the correct ligand binding mode. More
importantly, sets of accordingly treated compounds can be ranked by apply-
ing scoring functions that estimate ligand affinity, allowing the enrichment of
actual chemical compound libraries with potentially good binders.
Structural genomics efforts seek to determine protein structures on the
genome scale. The availability of many X-ray crystallographic structures is
expected to deeply impact the discovery of new drugs and targets. Anticipated
benefits are, for example, the discovery of new targets by assignment of func-
tions to orphan targets, the improved quality of homology models for difficult
targets, and the use of antitargets to decrease ligand cross-reactivity.
This chapter seeks to familiarize scientists with X-ray protein crystallo-
graphic techniques and their exemplified application for the purpose of drug
discovery. Outlined are the individual steps that are required for determining
X-ray crystallographic structures of proteins and protein–ligand complexes.
First, the generation of crystals and their use in the X-ray diffraction experi-
ment is described. Then state-of-the-art methods for crystal structure deter-
mination, model generation, and their refinement are reviewed, followed by
an account of various ways of analyzing the resulting structural models.
Several prominent examples are presented where X-ray crystallography has
aided drug discovery. Finally, the limitations of the method are discussed.
10.2 CRYSTALLIZATION
Crystallography is the science of crystals. The word crystallography, a com-
posite, derives from the Greek words crustallos or krusttalo and graphos
or grafo for writing. The word crustallos meaning “frozen” or “clear ice,”
captures several macroscopic crystal properties of these fascinating materials.
Besides being transparent and solid, crystals are often facetted and, most
V
V
376 PROTEIN X-RAY CRYSTALLOGRAPHY IN DRUG DISCOVERY
1
ch10.qxd 5/6/05 03:07 PM Page 376
Page 5
hidden
importantly, have a high degree of internal order. They consist of regular three-
dimensional lattices of molecules (see later section about the fundamentals of
X-ray diffraction) and protein molecules with or without their bound drug
compounds. Protein crystals consist of ordered protein and usually a sizeable
fraction of disordered solvent, water. How are such crystals obtained?
Although there are no fixed rules as to how to grow protein crystals, the
advent and combination of molecular biology, affinity tag purification tools,
and laboratory liquid dispensing automation has helped to define generalized
crystal growth procedures. Every protein, however, is different and a certain
method and condition that works well for one particular protein is usually not
applicable to the crystallization of other proteins. Furthermore, it is not pos-
sible to predict crystallization conditions on the basis of sequence. Therefore,
the de novo crystallization of a particular protein poses a significant challenge.
While crystallographers have used systematic approaches to search for crys-
tallization conditions, in most cases, preformulated screening kits are used
(Table 10.1). Once a crystallization hit has been identified in a crystallization
scouting trial, these initial conditions are often refined in order to grow crys-
tals that are suitable for X-ray diffraction experiments. On the other hand,
many protein crystallization conditions have been reported in the scientific lit-
erature and in databases (see below) and are therefore much simpler to carry
out. Finally, comprehensive and practical advice on crystallization experiments
can be found in Bergfors [6], McPherson [7, 8], and in Ducruix and Giegé [9].
Background
Protein crystallization occurs in three stages, nucleation, growth, and cessation
of growth [10, 11]. Proteins crystallize from supersaturated solutions where
the concentration exceeds their equilibrium solubility. The state of supersatu-
ration depends on many factors such as the concentration and nature of the
protein in question but also on salts and other components of the solution. It
is this state of supersaturation that needs to be created in a crystallization
experiment—the so-called crystallization setup—for crystal nuclei to form and
crystals to grow (Fig. 10.2). The underlying principle is to alter the properties
of the solvent (water), disrupt the interaction of water molecules with protein
molecules, and increase the attractive interaction among protein molecules.
This is generally done with (a) salts, (b) organic solvents, or (c) polymer com-
pounds. When salts dissolve in water, their ions become hydrated and capture
water molecules, the latter of which become unavailable to interaction with
proteins. Therefore, the addition of salts such as ammonium sulfate may be
used to favor protein–protein interactions, to form crystallization nuclei, and
support crystal growth, thus salting out the protein.
Intriguingly, the opposite procedure, salting in, the removal of ions from a
protein solution may also be used for the crystallization of proteins. Ions
balance protein surface charges, and, once they are removed, protein mole-
cules can balance their electrostatics by interacting with each other [12].
CRYSTALLIZATION 377
ch10.qxd 5/6/05 03:07 PM Page 377
Page 6
hidden
1TABLE 10.1 Selection of Vendors of Crystallography-Related Products
Category Items Vendor / Company
Crystallization Crystal screening formulations, Hampton Research
tools various crystallography supplies
Crystal screening formulations, Emerald BioSystems
crystallization plates
Crystal screening formulations, Jena Bioscience
Microfluidic crystallization devices Fluidigm
Crystallization plates Nextal
Crystallization plates Corning
Crystallization plates Greiner
Crystallization CrystalMiner (database application Emerald BioSystems
trial robotics for crystallization trials)
CrysTel (integrated incubation and
imaging system)
CrystalMonitor (automated
crystallization imaging workstation)
MatrixMaker (automated solution
formulation robot for the
formulation of crystallization
screening kits)
RoboFill (automated crystallization) RoboDesign
Odyssey (crystallization incubation
and imaging system)
RoboMicroscope II (automated
crystallization imaging workstation)
CrystalScore (automated Diversified Scientific
crystallization imaging workstation)
925 PC Workstation (crystallization Gilson
setup automation)
Crystal Farm (integrated incubation Discovery Partners
and imaging system)
Rock Maker (software application International
for crystallization trials)
Crystallization automation Douglas instruments
X-ray X-ray diffraction instrumentation, Bruker
diffraction detectors, optics and detectors,
instrumentation robotic sample changing system
X-ray diffraction instrumentation, Mar Research
detectors, optics, and detectors,
robotic sample changing system
X-ray diffraction instrumentation, Rigaku MSC
detectors, optics, and detectors,
robotic sample changing system
X-ray optics Osmic
378
ch10.qxd 5/6/05 03:07 PM Page 378
Page 7
hidden
Organic solvents also bind water molecules, but in addition they lower the
dielectric constant in the crystallization solution, thus enhancing electrostatic
interactions between protein molecules. Finally, crowding agents such as
hydrophilic polymers compete with protein molecules for hydration. This
arises directly from water binding and indirectly via excluded-volume effects.
The mechanism of crystal nucleation is poorly understood. Once a protein
solution is supersaturated and nuclei are present, crystals may grow on pre-
formed crystal faces by deposition into lattice positions via diffusion and
convection [13]. The crystalline state of matter is generally considered the
thermodynamically most stable state, and the crystallization process can there-
fore be understood in terms of free energy minimization. Indeed, a lowering
of the free energy by 12 to 25kJ/mol has been measured for protein crystal-
lization processes [14]. The last process in crystallization, the cessation of
growth, may occur for a number of reasons such as limited protein supply or
the poisoning of crystal growth surfaces by contaminations and crystalline
defects.
CRYSTALLIZATION 379
Supersaturation
Zone
Nucleation
Growth
A
B
C
Precipitant Concentration
Precipitation
Curve
Solubility
Curve
Pr
ot
ei
n
Co
nc
en
tra
tio
n
Precipitation Zone
Figure 10.2 Schematic phase diagram of protein crystallization in the salting-out
regime. Crystal nucleation is a critical phenomenon that may occur only in a certain
area of the supersaturation zone and crystals may grow under conditions of supersat-
uration once nuclei have formed. (A) Pathway of a batch-type crystallization
experiment. By mixing protein with the precipitant solution, the protein becomes
supersaturated. Crystal nuclei form and crystals grow until the protein concentration
in solution is saturated. (B) Pathway of a vapor diffusion-type crystallization experi-
ment. The slow concentration process—via vapor diffusion—that follows mixing of the
protein with the precipitant solution causes the protein to become supersaturated. The
vapor diffusion process causes a concomitant increase in precipitant concentration thus
extending the crystal growth process. (C) Pathway of a dialysis-type crystallization
experiment. As the precipitant diffuses into the protein chamber, a state of protein
supersaturation is reached. Once nuclei have formed, protein crystals may grow as long
as the protein concentration remains supersaturated.
ch10.qxd 5/6/05 03:07 PM Page 379
Page 8
hidden
Formats
A multitude of crystallization methods and formats are available, with batch
and vapor diffusion being the most popular ones (Fig. 10.3). Since the quan-
tity of protein sample is often limited, microcrystallization methods have been
devised for batch, vapor diffusion, and other, more exotic, crystallization
methods. They can all be carried out manually using appropriate plasticware
(Table 10.1) or by appropriate liquid dispensing instrumentation such as multi-
channel pipettors or dispensing robots. Comparative studies of protein crys-
tallization by vapor diffusion and microbatch techniques yielded a similar
effectiveness of the two methods [15]. Some methods, however, tend to
succeed for certain targets while other methods are more useful for different
protein targets.
380 PROTEIN X-RAY CRYSTALLOGRAPHY IN DRUG DISCOVERY
1
Glass
Cover
Slide
Glass
Cover
Slide
Protein Solution
and Precipitant Vacuum
Grease
Vacuum
Grease
Protein Solution
and PrecipitantReservoir
Solution
Reservoir
Solution
Oil
1 mL Protein Solution
+
1 mL Well Solution
after Equilibration:
1 mL total volume
Tray
Protein Solution
and Precipitant
A B
C D
Figure 10.3 Schematic depiction of typical formats for vapor diffusion and batch crys-
tallization experiments. (a) Hanging drop format of a vapor diffusion crystallization
experiment. A small drop containing the protein and precipitant solution is applied
onto a glass slide. The glass slide is attached to the crystallization chamber, the bottom
of which is filled with the so-called reservoir solution. The concentrations of volatile
components in the drop and reservoir solutions equilibrate over time through the vapor
phase. Crystals form in the hanging drop. (b) Sitting drop format of a vapor diffusion
crystallization experiment. A small drop containing the protein and precipitant solu-
tion are placed into a well. The crystallization chamber may be sealed with a glass cover
slide or with transparent tape. Well and reservoir communicate via the vapor phase.
Crystals form in the sitting drop. (c) “Under oil” format of a batch crystallization exper-
iment. A small drop containing protein and precipitant solution is placed into a crys-
tallization well. A layer of oil prevents dehydration. Crystals form in the drop. (d) Setup
and equilibration of a vapor diffusion crystallization experiment with the hanging drop
format. A 2-mL hanging drop (consisting of 1-mL protein solution and 1-mL precipitant
solution from the reservoir) equilibrates via the vapor phase. Water transport occurs
from the hanging drop to the reservoir because the hydrostatic pressure in the hanging
drop is initially lower than in the reservoir solution. At equilibrium the hanging drop
has a volume of 1 mL, with protein and precipitant concentration similar to that of the
initially separated components.
ch10.qxd 5/6/05 03:07 PM Page 380
Page 9
hidden
Batch-type crystallization setups are conceptually simple experiments. Here,
a protein solution is combined with a precipitant solution and crystals form
within this solution. In order to prevent dehydration, oils may be added to seal
off the crystallization experiment [16, 17] (Fig. 10.3c). Popular crystallization
trays for microbatch crystallizations are so-called Terazaki plates, which
provide small wells that can be used to hold final trial volumes of less than
1mL [15] and 5 to 10mL oil (silicon oil, paraffin oil, or mixtures). Oddly, the
state of supersaturation is reached by dilution of the protein solution with a
precipitant solution (A in Fig. 10.2). Therefore starting protein concentrations
are often required to be chosen higher for this crystallization method than for
the other methods.
Vapor diffusion crystallization experiments can be carried out in sitting drop
or in hanging drop format. While the hanging drop arrangement is less prone
to crystals sticking to a solid surface, it is less cumbersome to set up sitting
drop crystallizations. In a hanging drop crystallization experiment, the protein
solution is combined with the precipitant solution on a glass cover slide,
inverted and, attached to a well containing the reservoir solution (Fig. 10.3a).
Prior to the attachment, the rim of the well is beaded with vacuum grease in
order to prevent dehydration of the crystallization chamber. The sitting drop
format is very similar. Specialized crystallization plates are used that provide
a platform to hold the drop (Fig. 10.3b). The chamber may be sealed off with
transparent tape or with vacuum grease and a glass cover slide.
In vapor diffusion experiments, the precipitating solution is usually filled
into the reservoir. Once protein and precipitation solution are combined and
the crystallization chamber is sealed, the crystallization drop and the reservoir
solution communicate via the vapor phase. At a typical 1 :1 mixing ratio, the
concentration of precipitant in the crystallization drop is only half of that in
the reservoir solution (Fig. 10.3d). During the course of equilibration, the con-
centration of the precipitant increases to that in the reservoir, and, concomi-
tantly, the drop volume decreases by about half (B in Fig. 10.2). The kinetics
of this equilibration process and the pathway to supersaturation depend on
the type of precipitant, the crystal chamber and drop geometry, and tempera-
ture. Crystals may form during or after equilibration.
Protein crystallization by dialysis is somewhat less popular but provides a
high degree of control in the crystallization process [18]. First, a protein solu-
tion is filled into a small chamber that is closed by a dialysis membrane. Then
a precipitant solution is brought into contact with that membrane and an
exchange of small molecules and ions can occur. Exchange of proteins and
other macromolecules are excluded by the dialysis membrane. Thus, salt con-
centrations can be increased slowly without overshooting into the precipita-
tion zone (C in Fig. 10.2).
Membrane Proteins
About half of all drug targets constitute membrane proteins [19], proteins that
are inserted in cellular membranes where they support critical functions such
CRYSTALLIZATION 381
ch10.qxd 5/6/05 03:07 PM Page 381
Page 10
hidden
as solute transport, energy conversion, and signal transduction. Unfortunately,
not a single structure of a transmembrane protein of pharmaceutical relevance
is available to date. This is a result of the inherent difficulties of working with
membrane proteins in the stages of expression, purification, and crystalliza-
tion. There is, however, no principle reason for such proteins to systematically
fail crystallization attempts. Indeed, many structures of membrane proteins
have been determined once concerted efforts were taken, including structures
of homologous proteins of the most prominent class of transmembrane
protein drug targets, GPCRs (G-protein-coupled receptors). Several methods
have been devised for the crystallization of this class of proteins, including the
use of protein–detergent complexes, bicelles, and lipidic cubic phases [20, 21].
The latter method employs a lipid matrix as a crystallization facilitator, and
the practicalities are described in detail by Nollert et al. [22].
Crystallization Factors
What are the factors that affect the crystal nucleation and growth processes?
Of course, the quality and purity of the protein sample is of great importance
since impurities may interact and disturb crystal packing. Fundamentally, the
protein sequence, its organization into domains, and oligomerization state are
decisive factors. The purity of proteins is usually judged in a semiquantitative
way by sodium dodecyl sulfate (SDS) polyacrylamide gel electrophoresis
(PAGE) and Coomassie brilliant blue staining. Purity levels above 90 to 95
percent are considered acceptable. For some proteins, however, crystallization
can be a purification method, and crystals of minor contaminants have been
grown from mixtures of many proteins. Particle homogeneity, that is, the uni-
formity of the oligomeric state is also an important factor, usually character-
ized by dynamic light scattering or by size exclusion chromatography (SEC).
The latter chromatography is often applied as a final so-called “polishing step”
in protein purification. Prior to use, protein samples are often centrifuged or
filtrated to remove particulate contaminants. In order to prevent degradation,
protease inhibitors and antimicrobials such as sodium azide or potassium
cyanide (0.02 percent final concentration) are sometimes added. The concen-
tration of the protein and the precipitation agent used have a tremendous
effect on crystallization (Fig. 10.2). It is advisable to start crystallization exper-
iments at high protein concentrations (greater than 10mg/mL) and lower these
once precipitate is detected in crystallization setups (Fig. 10.4b). Conversely,
the protein and precipitant concentration may need to be increased if the drop
remains clear (Fig. 10.4a).
Recombinant proteins are often expressed with tag fusions to assure high
expression yield, solubility, protection from proteolysis, improved folding, and
simple purification via affinity chromatography [23]. Their presence can either
aid or impede crystallization, depending on the nature of the fusion, linker,
and host protein. Several crystal structures have been obtained with their
uncleaved large fusion proteins (including maltose-binding protein, thiore-
382 PROTEIN X-RAY CRYSTALLOGRAPHY IN DRUG DISCOVERY
1
ch10.qxd 5/6/05 03:07 PM Page 382
Page 11
hidden
doxin, and glutathione-S-transferase) intact, where short three to five amino
acid linkers were employed [24]. Alternatively, these tags may be removed
with specific proteolytic enzymes that cleave appropriately engineered linker
sites between the tag and the host protein. Once the protein is purified and
concentrated, it may be stored via rapid freezing in liquid nitrogen. A cooling
procedure proved beneficial that employs protein solution volumes below
50mL and 0.2mL ultrathin-walled polymerase chain reaction (PCR) tubes
[25]. Fast thawing to room temperature was critical in order to prevent
precipitation.
Besides precipitant type and concentration, further factors that affect crys-
tallization include temperature, buffer type and concentration, the presence of
additives, crystallization format, geometry, and other environmental parame-
ters. It is not possible to screen all of these factors systematically. Several
hundred crystallization experiments are therefore usually carried out varying
the temperature (4 and 20°C) and formulations of precipitating agents, while
all other parameters are kept constant. Subsequent fine screening may then be
accomplished by systematically screening other factors. Rather exotic factors
such as electric and magnetic fields have been identified to affect crystal quality.
Their systematic use in the crystallography laboratory, however, is limited.
CRYSTALLIZATION 383
(a)
(b) (d)
(c)
Figure 10.4 Images of hanging drop crystallization experiments. (a) Clear 1-mL drop
at the outset of the crystallization experiment. (b) Precipitate. (c) Crystals of lysozyme
inside a hanging drop. (d) Hanging drop with birefringent lysozyme crystals, imaged
under cross-polarization setting.
ch10.qxd 5/6/05 03:07 PM Page 383
Page 13
hidden
of the limitations of these screens by combining multiple precipitation agents.
If crystal hits are not obtained with the initial setup, follow-up experiments
should be set up with lowered protein or precipitant concentrations where pre-
cipitate was observed and increased protein concentrations where clear drops
were observed. In cases where no crystals are obtained from such trials, exper-
imenters often switch to a different screening kit, or many screening kits are
set up in parallel at the outset. Due to the stochastic nature of screening in
general, it is impossible to know a priori how many crystallization experiments
need to be set up in order to obtain a crystallization hit. Based on statistical
analysis and a 1 to 2 percent likelihood for obtaining initial hits for typical
proteins [29], it is estimated that screening 228 or 459 random crystallization
conditions provides a 99 percent likelihood of observing at least one
crystallization hit. The software program CRYSTOOL allows creating such
random screens.
The systematic assessment of protein precipitation and solubility behavior
is an iterative process. A particular precipitation agent is chosen and combined
with the protein in batch-type crystallization experiments to map out a phase
diagram similar to that shown in Figure 10.2. This is repeated for other pre-
cipitation agents if hits are not produced. Popular precipitation agents are
ammonium sulfate, polyethylene glycols, Na/K phosphate, sodium chloride,
and MPD [30].
Both crystallization approaches, systematic assessment or random screen-
ing, may provide either crystals suitable for X-ray diffraction experiments or
poor hits that require refinement of crystallization conditions. In the latter
case, fine screens varying one component concentration at a time are prepared
in order to optimize crystal growth. In addition, crystal seeding experiments
can be set up in which small or micro crystals serve as nucleation points for
larger crystals useful for X-ray diffraction data collection [6].
Some of these different crystallization methodologies have been shown to
work better for one protein than for others. Therefore, in de novo crystalliza-
tion projects the type of crystallization method applied may be a useful para-
meter to test.
Cryopreservation for Cryocrystallography
In order to expose protein crystals to a beam of X-rays, individual crystals are
either mounted within a glass capillary or they are “fished” with a filament
loop and cooled to low temperatures, for example, that of liquid nitrogen [31,
32], cooled propane, ethane, CCl3F, or BrCF3. A gaseous nitrogen jet is the
most popular method for cooling crystals and allows X-ray diffraction exper-
iments to be carried out at temperatures around 100K. This procedure reduces
X-ray radiation damage and reduces protein motion, both increasing the dif-
fraction data quality.
Water crystals diffract X-rays and produce characteristic powder diffrac-
tion rings, and, thus, their formation during the crystal cooling process should
CRYSTALLIZATION 385
ch10.qxd 5/6/05 03:07 PM Page 385
Page 14
hidden
be avoided. This can be done by flash-cooling, which transforms liquid water
into a glassy material. Water glass formation can be aided by the addition of
so-called cryoprotectants such as glycerol, methyl-pentendiole, or trehalose. A
popular strategy is to first prepare different cryoprotectant :precipitant solu-
tion mixing ratios, test their diffraction properties when frozen, and use the
solution with the lowest cryoprotectant content that does not yield ice rings.
Alternatively, ice formation on the protein crystal during the cooling process
may be suppressed with high concentrations of certain salts. Lithium formate,
chloride, and other highly soluble salts, so-called cryosalts, have been shown
to suppress the formation of water ice upon cooling from ambient termpera-
ture to around 100K [33]. Cyroprotectants may be avoided altogether by strip-
ping surface-associated water from crystals by pulling them through oil, for
example, perfluoropolyether, paratone-N, and paratone-N/mineral oil mixes
[34].
Crystal-Based Drug Discovery
There are two ways to prepare crystals of target proteins with their bound
small-molecule ligands. One is co-crystallization where the ligand–protein
complex is crystallized, the other is soaking of pre-formed apo-crystals with a
solution containing ligand molecules. While soaking may be very economical
since a single crystallization setup can provide up to hundreds of usable crys-
tals, not all ligands can be soaked in, with solubility issues posing a common
problem. In co-crystallization experiments, at equilibrium two protein species
are present, the apo-protein (no ligand bound) and the ligand-bound form.
The relative fraction of these two forms is determined by the affinity constant
Ka, the concentration of the ligand and the concentration of the protein. At
an excess of ligand concentration for a simple single binding site model, the
equilibrium binding is given by
(10.1)
with [PL] the concentration of the protein–ligand complex, [P]T the total
protein concentration, and [L] the concentration of the free ligand. In many
cases, it is difficult to estimate the proper concentrations in a crystallization
setup. The goal is, of course, to populate most protein molecules with their
ligand. As a rule of thumb, the concentration of a ligand to be used in a crys-
tallization setup should exceed that of the protein by more than a factor of
ten. A typical protein sample for crystallization is around 10mg/mL protein.
Depending on the molecular weight of the protein, the protein concentration
will be around 250mM. The ligand concentration should also exceed the dis-
sociation constant of the ligand (which is the reciprocal association constant)
by more than a factor of 10. Thus, if possible, a ligand concentration of more
than 1mM is often used.
PL
P K L
K L
T a
a
[ ] = [ ] [ ]+ [ ]1
386 PROTEIN X-RAY CRYSTALLOGRAPHY IN DRUG DISCOVERY
1
ch10.qxd 5/6/05 03:07 PM Page 386
Page 15
hidden
Co-crystallization requires the formation of the ligand–protein complex
prior to the crystallization process. This may be achieved simply by mixing of
a protein solution with the ligand solution. In some cases where the off-rate
of the ligand–protein complex is very low, a purification of the complex by size
exclusion chromatography is feasible. It is evident that adding the small mol-
ecule, and possibly an organic solvent, usually DMSO (dimethylsulfoxide),
and the formation of two protein species (apo-protein and ligand–protein
complex) may alter the crystallization regime and thus the crystal quality or
crystallization success. Therefore, co-crystallizations are frequently optimized
for a particular ligand.
The ligand binding event often triggers subtle conformational transitions
toward compacted protein structures, preventing the formation of those
crystal contacts that are present in the respective apo-crystals. Alternatively,
ligand binding may lead to the formation of crystals with a different packing
arrangement and space group. This so-called ligand-depended crystal poly-
morphism [35] can be used as a tool for (a) binding or (b) the identification
of initial crystallization conditions. The latter may be employed practically by
screening a set of ligands as additives in co-crystallization experiments. As the
formed ligand–protein complexes differ slightly in their conformations, dif-
ferent crystal contacts and packing arrangements are being sampled, thus
increasing the chance of successful crystallization. Once a crystallization con-
dition is identified, it may be used as a starting point for crystallizations with
further ligands. This strategy is supported by the many cases where crystals of
protein–ligand complexes have been obtained more readily than those of the
respective apo-forms. Indeed, apo-forms of many ligand binding proteins have
never been crystallized at all. It must be noted, however, that a particular crys-
tallization condition that readily yields crystals of the apo-crystal form may
not necessarily produce any crystals of ligand-bound forms. Most troubling, a
given crystallization condition may not be compatible with binding of the
ligand to the protein. Such a case may be envisaged where a protein crystal-
lizes at low pH but does not bind its ligand at that pH.
Soaking compounds into pre-formed crystals is very popular because it
short-cuts the possibly tedious process of identifying optimized co-
crystallization conditions with a ligand. Soaking is performed by either (a)
transferring the crystal into a solution consisting of precipitant solution and the
ligand compound or by (b) addition of the ligand compound directly to the
crystal-containing crystallization drop. It is difficult to estimate the preferred
incubation time since the formation of the ligand–protein complex depends on
the on-rate, diffusion of the compound within the crystal, and the component
concentrations. In order to render hydrophobic drug leads soluble in aqueous
solutions, organic solvents, detergents, or cyclodextrins have been employed.
The ligand compound or the solubilizing agent may have detrimental
effects on the X-ray diffraction quality of crystals. In many cases visible cracks
form upon soaking of ligands into apo-crystals. Evidently, substantial ligand-
induced conformational transitions occur that may not be compatible with the
CRYSTALLIZATION 387
ch10.qxd 5/6/05 03:07 PM Page 387
Page 16
hidden
original packing of the apo-crystal form. In such cases, co-crystallization exper-
iments are warranted. Even though the formation of visible cracks renders
crystals unusable for diffraction purposes, the effect may, for special cases, be
used as an indicator for ligand binding [36].
Both soaking and co-crystallization may be employed for crystal-based
drug discovery. The locations and binding modes of lead compounds and the
specific interactions of the small-molecule atoms with those of the protein
provide unique insight that can be used in several ways (see Section 10.6). Fur-
thermore, cocktails of structurally diverse compounds or fractions of fragment
libraries have been co-crystallized or soaked into apo-crystals and the ensuing
crystal structures displayed high-affinity ligands [1], thus forming the basis of
crystallographic screening.
Derivatization for Anomalous Diffraction Experiments
In order to obtain crystallographic phases crystals may be derivatized and used
to collect anomalous X-ray diffraction data. In cases where proteins are
expressed recombinantly, the protein may be labeled in vivo with selenium,
substituting all sulfur-containing methionine residues by seleno-methionine.
This method has become very popular due to the availability of tunable X-ray
sources at synchrotrons. Chemical derivatization with heavy metals is a
soaking or co-crystallization-based method. Cysteine, histidine, and methion-
ine residues may react with heavy atoms such as mercury, gold, platinum, and
iridium, whereas glutamate and aspartate can be complexed with lanthanides
and actinides such as uranium and samarium [37]. Popular derivatizing
reagents are K2PtCl4, KAu(CN)2, Hg(CH3COO)2, Pt(NH3)2Cl2, and HgCl2 [38].
The binding of a particular heavy-metal compound to a protein may be
screened prior to the crystallization and diffraction experiment by a simple
native gel shift assay. Once a suitable reagent has been identified, the proto-
cols for binding of heavy-metal compounds to proteins is similar to the binding
of small-molecule compounds to form protein–drug complexes.
Crystallization Data Sources
More than 25,000 protein crystal structures have been deposited in the pub-
licly accessible Protein Data Bank (PDB) (http://www.rcsb.org/pdb/). The
crystallization conditions are available within the PDB files under section
REMARK 280. As an example the crystallization conditions for hen egg
lysozyme given in the structure report 3LZT are shown below:
388 PROTEIN X-RAY CRYSTALLOGRAPHY IN DRUG DISCOVERY
1
REMARK 280 CRYSTALLIZATION CONDITIONS: BATCH METHOD USED. 1% PROTEIN
REMARK 280 SOLUTION IN 100MM SODIUM ACETATE PH 4.5-4.6. SODIUM
REMARK 280 NITRATE ADDED TO A CONCENTRATION OF 20MGS/ML. CRYSTALS
REMARK 280 GROWN AT ROOM TEMPERATURE.
ch10.qxd 5/6/05 03:07 PM Page 388
Page 17
hidden
Here, the batch crystallization method was used where a 10-mg/mL solution
of hen egg white lysozyme in 100mM sodium acetate buffer at pH 4.5 to 4.6
was combined with a precipitant solution containing a 20-mg/mL sodium
nitrate solution. Scientific publications describing protein structures usually
list crystallization conditions in their methods section. A specialized protein
crystallization periodical does not exist. The International Union of Crystal-
lography (IUCr), however, publishes a monthly journal Acta Crystallograph-
ica D—Biological Crystallography that hosts a section on crystallization
papers, where crystallization methods, conditions, and technical advances in
crystallization methodology are reported.
Finally, the crystallization database BMCD (Biological Macromolecule
Crystallization Database), set up by Garry Gilliland, is a useful resource
for finding crystallization conditions of previously crystallized proteins
[30, 39]. Version 2.0 of the BMCD is available on the Web at
http://wwwbmcd.nist.gov:8080/bmcd/bmcd.html.
Protein Crystallization Demonstration Experiment
Crystals of hen egg white lysozyme may be grown readily for demonstration
purposes. A detailed procedure is given below for the crystallization of
lysozyme according to the sitting drop vapor diffusion format.
1. Prepare a 100-mg/mL lysozyme solution by weighing out 100 mg of
lyophilized chicken egg white lysozyme [Sigma (L7651)] into a tube and
adding 1mL of 50mM sodium acetate at pH 4.5.
2. Prepare the precipitant solution; 30 percent (w/v) MPEG [mono-methyl
polyethylene glycol 5000, Sigma (M7268)], 1M sodium chloride, 50mM
sodium acetate at pH 4.5.
3. Fill 200mL of the precipitant solution into the reservoir.
4. Pipet 2mL from the reservoir solution into the crystallization well.
5. Add 2mL of the lysozyme solution to the crystallization.
6. Crystals similar to those shown in Figure 10.4 appear within 1h.
10.3 X-RAY DIFFRACTION EXPERIMENT
The information required to determine the three-dimensional crystal structure
of a protein can be extracted from the X-ray diffraction data on a crystal of
the target protein. In this section, basic diffraction theory and diffraction data
collection practice will be discussed.
Diffraction Theory
Images of objects with dimensions ranging in size from a single cell to macro-
scopic scale are readily obtainable by visible light photography or microscopy.
X-RAY DIFFRACTION EXPERIMENT 389
ch10.qxd 5/6/05 03:07 PM Page 389
Page 19
hidden
images. In a protein X-ray crystallography experiment, a crystal of the target
protein is placed in an intense X-ray beam of a particular wavelength l. The
protein crystal is a three-dimensional array of diffracting objects (the atoms
and their electrons), which scatters the incident X-rays to produce a three-
dimensional diffraction pattern. The three-dimensional diffraction pattern is
typically recorded in two-dimensional sections by a detection plate. Figure 10.8
shows sample protein crystal diffraction images. Each of the dark spots in the
diffraction images represents a diffracted X-ray whose intensities are maxima
of constructive interference between scattered waves. In a typical protein dif-
fraction data collection, tens or hundreds of thousands of such diffraction spots
are collected from a single crystal.
X-RAY DIFFRACTION EXPERIMENT 391
Double-Slit Diffraction
Single-Slit
Envelope
Incident
Plane
Wave
Figure 10.5 Diffraction of waves passing through two slits [40]. As radiation passes
through a slit of width comparable to the radiation wavelength, the slit scatters the
radiation in all directions. In this example, each of the two slits on the left act as emit-
ters of the incident radiation. The radiation emitted by the two slits will interfere, result-
ing in the solid-line diffraction pattern on the right. (The relative heights of the peaks
in the diffraction pattern correspond to intensity.) Interfering waves which do not have
a large angular deviation from the incident radiation, are largely in phase with each
other resulting in intensity maxima. As the angular deviation of the scattered waves
from the incident radiation increases, the scattered waves are less in phase with
each other, resulting in the general intensity falloff as one moves away from the center
of the diffraction pattern. Shown in broken line is the diffraction pattern of a single
slit.
ch10.qxd 5/6/05 03:07 PM Page 391
Page 20
hidden
In the analysis of the X-ray diffraction pattern, each of the spots (also
referred to as reflections) is designated as Fhkl, wherein the set of three inte-
gers, hkl, are called Miller indices. The Miller indices give the position of the
reflection relative to the orientation of the crystal during the data collection.
Fourier Synthesis in X-ray Crystallography
Since each of the spots in the diffraction pattern represents a wave (the dif-
fracted X-rays), Fourier synthesis of these diffracted waves can be used to
392 PROTEIN X-RAY CRYSTALLOGRAPHY IN DRUG DISCOVERY
1
2.0
(a )
(b )
(c )
1.4
0 = phase difference
l/4 = phase difference
l/2 = phase difference
l
0.0
Amplitude = 1.0
Resultant Wave
Figure 10.6 Constructive and destructive interference of waves [41]. The resultant
wave produced by two or more interfering component waves is obtained by adding
their amplitudes at corresponding points. (a) Total constructive interference. The two
component waves are completely in phase. The maximum amplitude of the resultant
wave is double that of each component wave. (b) Partial constructive interference. The
two component waves are 90° out of phase. The resultant wave has an amplitude 1.4¥
the amplitude of each component wave. (c) Total destructive interference. The two
component waves are 180° out of phase. The two component waves cancel each other
out, and the resultant wave has zero amplitude.
ch10.qxd 5/6/05 03:07 PM Page 392
Page 21
hidden
reconstruct the diffracting array, namely the electron density of the protein in
the crystal. The atoms of the protein, known from its amino or nucleic acid
sequence, are then built into the reconstructed electron density to give the
three-dimensional structure of the target protein.
Fourier syntheses are mathematical calculations used to reconstruct any
regularly repeating pattern, regardless of complexity, via the summation of
relatively simple sine and cosine curves [Eq. (10.2)].
(10.2)
As a very rudimentary example of a Fourier synthesis, consider the square
wave in Figure 10.9. At first glance, it may appear impossible to reconstruct
this function, especially its sharp corners, from smoothly curved sines and
cosines. The summation of only four terms in the Fourier series, however,
results in a function that is clearly beginning to take on the shape of the square
f x a a hx b hx h nh hh( ) = + +( ) =Â0 2 2 1 2 3cos sin , , , ,p p L
X-RAY DIFFRACTION EXPERIMENT 393
Original
Grating
Diffraction
Pattern
a
b
a
a*
b
b*
g g*
K/a
K/b
Figure 10.7 Schematic representations of diffracting arrays (left column) and their
resultant diffraction patterns (right column). In the diffraction arrays, the black dots
represent holes. In the diffraction patterns, the black lines and black dots represent
intensity maxima of the diffracted radiation. Note that the sampling in the diffraction
pattern is along the direction of the diffracting array. Comparing the top and middle
examples shows that the sampling regions are farther apart in the diffraction pattern
of the middle example, due to the closer spacing of the holes in the diffracting array.
The bottom example shows a two-dimensional diffracting array. Its diffraction pattern
shows discreet sampling in two directions, resulting in spots of intensity maxima in the
diffraction pattern, as opposed to lines in the other two examples [41].
ch10.qxd 5/6/05 03:07 PM Page 393
Page 22
hidden
wave. The addition of more Fourier terms will produce a resultant function
that looks increasingly like the target square wave.
In protein X-ray crystallography, the “wave” that one tries to reconstruct is
the electron density within the protein crystal. The electron density in the
crystal is a regularly repeating three-dimensional function. The wave compo-
nents [the sines and cosines in Eq. (10.2)] used to reconstruct the electron
density of the target macromolecule are the individual spots of the X-ray dif-
fraction pattern. Equation (10.3) gives the electron density of the target
protein at a point (x, y, z):
(10.3)
In Equation 10.3, the electron density at a three-dimensional coordinate
position (x, y, z), r(x, y, z), is expressed as the Fourier summation of all of the
spots in the diffraction image. The diffraction spots, Fhkl, are waves, which have
amplitude and phase components. Their amplitude components are expressed
as |Fhkl |. Their phase components are the a(hkl) terms in Eq. 10.3, where a(hkl)
is the angular phase of Fhkl. The (1/V) factor is the reciprocal of the unit cell
volume of the sample crystal. The Fourier summation of the electron density,
Eq. 10.3, appears to lack the sine terms in the general Fourier summation, Eq.
10.2. Due to Friedel’s Law (Section 10.4) generally holding true in protein dif-
fraction, certain pairs of spots have the same intensity but opposite phase
angles and the sine terms in Eq. 10.2 cancel out.
Only the amplitude components of the Fhkl’s are directly measurable by the
X-ray detection hardware used in the diffraction experiment. The phase com-
ponents are determined by mathematical procedures that are discussed in the
next Section 10.4. The important thing to note is that in Eq. (10.3), as was the
r p ax y z V F hx ky lz hklhklhkl, , cos( ) = ( ) + +( ) - ( )[ ]Â1 2
394 PROTEIN X-RAY CRYSTALLOGRAPHY IN DRUG DISCOVERY
1
Figure 10.8 Representative diffraction patterns used in protein X-ray crystal struc-
ture determination. Each of the spots in these diffraction patterns is generated by dif-
fracted X-rays. Note the general tendency for the intensities of the diffraction spots to
decrease as one moves farther away from the center of the diffraction pattern.
ch10.qxd 5/6/05 03:07 PM Page 394
Page 24
hidden
of a crystal is the smallest portion of the crystal that can be used to generate
the entire crystal by whole unit cell translations along the a, b, c cell edges. For
protein crystals, the unit cell edges will be on the order of 50 to 400Å. The
unit cell is analogous to the scattering objects in Figure 10.7. A protein crystal
with a large unit cell will produce a diffraction pattern with close spacings
between the diffraction maxima, while a protein crystal with a small unit cell
will produce a diffraction pattern with far spacings between the diffraction
maxima. Because of this inverse relationship between unit cell size and dif-
fraction pattern spacing, the diffraction pattern is called the reciprocal lattice
of the actual crystal lattice.
Within each unit cell, there can be multiple copies of the protein molecule.
These copies within a unit cell are related to all of the other copies in the
unit cell by mathematical symmetry operations, such as rotations or
translations. The smallest portion of the unit cell that can be used to generate
the entire unit cell by the symmetry operations is called the asymmetric
unit. The expression for the electron density [Eq. (10.3)] refers to an entire unit
cell. In practice, since the contents of the unit cell can be generated by the math-
ematical symmetry operations, the X-ray crystallographic determination of
protein structures generally refers to a single asymmetric unit of the crystal.
Bragg’s Law and the Angular Dependence of X-ray Diffraction
The somewhat complicated phenomenon of diffraction by the three-
dimensional array of a protein crystal can be conceptually simplified by treat-
ing diffraction as the reflection of the incident X-ray beam from planes within
the crystal. These planes are planes of electrons in the crystal, with the elec-
trons being concentrated at the atoms. The incident X-rays can be thought of
as reflecting off of these planes, resulting in the spots in the diffraction pattern
where the X-rays strike the detection plate. The diffraction spots are there-
fore termed reflections. The geometric formulation of diffraction as reflections
from planes is shown in Figure 10.10.
In Figure 10.10, the incident X-rays of wavelength l are represented by 1
and 2 and are reflected by planes P1 and P2 in the crystal with an interplanar
separation d, resulting in the reflected X-rays 1¢ and 2¢. The incident X-rays 1
and 2 make an angle q with the planes P1 and P2. If the reflected X-rays 1¢ and
2¢ are to result in a beam of maximum intensity, the X-rays represented by 1
and 2, and 1¢ and 2¢ must be in phase. In this geometric construction, for 1¢
and 2¢ to be in phase to produce an intensity maximum, the extra distance
traveled by 2 and 2¢ compared with the distance traveled by 1 and 1¢ must
be an integral number of wavelengths l. Thus,
(10.4)
where n is an integer. Since AC/d = sinq, Eq. (10.4) can be rewritten as
ACB AC n= =2 l
396 PROTEIN X-RAY CRYSTALLOGRAPHY IN DRUG DISCOVERY
1
ch10.qxd 5/6/05 03:07 PM Page 396
Page 25
hidden
(10.5)
Equation (10.5) is Bragg’s law, which was derived by Sir William H. Bragg
and his son Sir William. L. Bragg, who were awarded the 1915 Nobel Prize in
physics for their work in X-ray crystallography. The interplanar spacing d is
the resolution of the particular diffracted wave. The resolution can be thought
of as the resolvable distance between planes that “reflect” X-rays and lead to
diffraction. It can be seen from Bragg’s law that for constant n, diffracted X-
rays with a greater angular divergence from the incident X-rays, those with
high q and farther away from the center of the diffraction pattern, result from
reflections off of planes with small interplanar spacings d. Therefore, those
reflections in the diffraction pattern with small interplanar spacings d and large
q are the high-resolution reflections.
The Bragg construction of diffraction also provides a description of the
Miller indices hkl assigned to a particular reflection. The reflection Fhkl can be
considered to be the resultant reflection off of the planes of constant spacing
intersecting the unit cell edges a, b, and c, respectively h, k, and l times. A low-
resolution reflection will have low integer values for hkl. For example, the
reflection F100 is the reflection off the plane intersecting the a cell edge once,
and never intersecting the b and c cell edges. This is the plane defined by the
b and c cell edges. The spacing of this plane is the a unit cell edge. For a typical
protein crystal, the resolution of the F100 reflection, d in Eq. (10.5), will be 50
to 400Å. As the Miller indices hkl for a reflection increase, the spacings of the
reflecting planes become smaller within the finite volume of the crystal’s unit
cell. Reflections Fhkl with higher integer values of hkl are therefore the higher
resolution reflections in an X-ray diffraction data set.
n dl q= 2 sin
X-RAY DIFFRACTION EXPERIMENT 397
2
2 ¢
1 ¢
1
0
d
A
BC
P1
P2 q
q
qqq
q
q
Figure 10.10 Geometric construction for the Bragg theory of diffraction [42]. Paral-
lel planes of electrons in the crystal are represented by P1 and P2. The incident X-rays
are represented by rays 1 and 2. Diffraction is treated as “reflection” of rays 1 and 2
off of the planes P1 and P2, resulting in the diffracted waves represented by rays 1¢ and
2¢.
ch10.qxd 5/6/05 03:07 PM Page 397
Page 33
hidden
Another hardware becoming more common at synchrotron beamlines is
the robotic sample mounter. These devices allow multiple crystals, on the order
of 100, to be loaded into a magazine, which is generally filled with liquid nitro-
gen to keep the crystals frozen. The crystals are then automatically mounted
and retrieved from the goniometer without the need for human intervention
[49, 50].
Synchrotrons are massive public works projects, with physical scale on the
order of kilometers, built by governmental agencies and costing around $1
billion. Some examples are the Advanced Photon Source at Argonne National
Laboratory in Argonne, Illinois, and the SPring-8 synchrotron in Hyogo, Japan.
While synchrotrons were originally built for high-energy particle physics
experiments, their use in protein crystallography is growing rapidly. There are
currently on the order of 50 to 100 synchrotron beamlines worldwide dedi-
cated to protein crystallography, with new ones being commissioned [51].
10.4 X-RAY CRYSTAL STRUCTURE DETERMINATION
The previous section discussed general diffraction theory and diffraction data
collection. This chapter will give an overview of determining the protein
crystal structure from the diffraction data.
Phase Problem
As described in the previous Section 10.3, the electron density of a protein,
and therefore its three-dimensional structure, can be reconstructed by the
summation of sine and cosine terms. Such a summation is called a Fourier syn-
thesis. The sine and cosine terms are the collected diffraction data, the reflec-
tions in the diffraction images (Fig. 10.8). These reflections are the X-rays
diffracted by the crystal. Since these X-rays are waves, they have both ampli-
tude and phase components.
The amplitude components of the diffraction data can be measured directly
by the X-ray detector, which measures the intensities of the diffraction spots.
The phase components, however, cannot be measured directly. This inability
to directly measure the phase in diffraction data is referred to as the phase
problem. In this chapter, various methods for determining phases for protein
X-ray diffraction data will be discussed.
Structure Factor
In a diffraction image, each of the reflections identified by the Miller indices
hkl are described mathematically by a structure factor Fhkl. The structure factor
is the resultant vector of all of the waves diffracted in the direction of the spot
hkl by all of the atoms in the crystal. The structure factor corresponding to a
spot hkl is therefore the vector sum of the structure factors of each atom in
the crystal. The structure factor of an individual atom in the crystal can be
expressed as:
X-RAY CRYSTAL STRUCTURE DETERMINATION 405
ch10.qxd 5/6/05 03:07 PM Page 405
Page 36
hidden
fraction resolution, which crystals can still be useful, to complete extinguish-
ing of diffraction. It is not uncommon to screen tens or hundreds of different
heavy atoms and soaking conditions to get a single heavy-atom-soaked protein
crystal that diffracts and maintains the unit cell dimensions of the unsoaked
protein crystal.
Due to the symmetry of the crystals, there are generally multiple copies of
the protein molecule in the unit cell of the crystal, with one or more copies of
the protein in the asymmetric unit of the crystal’s unit cell. The heavy atoms
will bind to each of the multiple protein atoms in the unit cell. The heavy atoms
will therefore occupy all of the asymmetric units in the crystal. The mathe-
matical symmetry operations relating each asymmetric unit to another result
in the maxima in the Patterson map occurring at certain sections of the uvw
Patterson space. Determining the atom positions from Patterson maps can be
done manually by relatively simple algebraic equations. There are now,
however, numerous software suites that automatically determine heavy atom
positions from Patterson maps, and the modern protein crystallographer
rarely, if ever, determines heavy atom positions manually.
X-ray diffraction data collected from a native protein crystal lacking heavy
atoms (data set P) and diffraction data from a crystal of the same protein
containing heavy atoms (data set PH) will differ in the measured intensities
of their reflections solely by the contribution of the heavy atoms in data set
PH. These differences in intensities can be used to calculate an isomorphous
difference Patterson map, which will show peaks corresponding to the
interatomic vectors between the heavy atoms (Fig. 10.15).
The isomorphous difference Patterson function calculated from data sets P
and PH is expressed as:
(10.12)
Compare this expression for the isomorphous difference Patterson with
that of the general Patterson function [Eq. (10.11)]. In the isomorphous dif-
ference Patterson, the amplitude term is the difference between the ampli-
tudes in the native protein and heavy-atom-derivatized protein data sets. The
amplitude differences due to the presence of heavy atoms in data set PH will
show up as strong peaks, allowing interpretation of the isomorphous differ-
ence Patterson maps.
The coordinates of the heavy atoms calculated from the positions of the
Patterson peaks essentially determine the heavy atom substructure of data set
PH. The amplitudes of the reflections from the heavy-atom-only substructure
of data set PH are easily calculated by the appropriate summation of heavy
atom scattering factors, which are listed in standard tables. The phases of the
reflections from the heavy-atom-only substructure are calculated from Eq.
(10.8). Once the heavy atom substructure has been determined, phases can be
determined for the protein diffraction data.
P a F F huu h= ( ) - ( )Â1 22P PH cos p
408 PROTEIN X-RAY CRYSTALLOGRAPHY IN DRUG DISCOVERY
1
ch10.qxd 5/6/05 03:07 PM Page 408
Page 39
hidden
mined for the measured reflections, incorporating these phases into the
Fourier summation for the electron density [Eq. (10.3)] will give an electron
density map, into which the model of the protein can be built.
After determination of the phases, electron density maps are generally
calculated by Fourier transforms using coefficients 2Fo - Fc. The explicit
expression for the electron density map calculation is [52]:
(10.13)
In Eq. (10.13), |Fo| and |Fc| are, respectively, the amplitudes of the reflec-
tions from the experimental diffraction data and the structure model built into
the electron density, and a is the calculated phase for the reflection with Miller
indices hkl. This type of electron density map shows the electron density of
the calculated model, and the difference electron density of the target protein
structure and the calculated model [52]. An example of an electron density
map calculated according to Eq. (10.13) is shown in Figure 10.17. The electron
density is usually represented in chicken-wire contours into which the protein
model can be built.
r p ax y z V F F hx ky lz hklhkl, , cos( ) = ( ) -( ) + +( ) - ( )[ ]Â1 2 2o c
X-RAY CRYSTAL STRUCTURE DETERMINATION 411
Figure 10.17 A 2Fo–Fc electron density map calculated according to Eq. (10.13).
ch10.qxd 5/6/05 03:07 PM Page 411
Page 40
hidden
In practice, successful interpretation and application of MIR data to deter-
mine a protein crystal structure is not as straightforward as may appear in
Figure 10.17. In MIR, it is assumed that the measured intensity differences
between the native protein data set P and heavy atom data sets PHn are due
solely to the presence of the heavy atoms in PHn. These differences in inten-
sity are on the order of a few percent, so great accuracy and care is required
in the measurement and processing of the diffraction data. Also, the unit cell
parameters between the P and PHn crystals must not deviate significantly. It
is not uncommon for a heavy atom soak of a protein crystal to distort the
crystal and change the unit cell parameters. Such heavy-atom-derivatized crys-
tals are then of little value in the structure determination. Because of its rel-
atively stringent experimental requirements, MIR is becoming less and less
popular compared with other methods, described below. Still, the general con-
cepts of phase determination by MIR carry through to the other methods.
Anomalous Dispersion Methods
Friedel’s Law As mentioned previously, X-ray diffraction can be treated as
reflections of X-rays according to Bragg’s law [Eq. (10.5)]. In protein X-ray
crystallography, the X-rays can be thought to reflect off of planes of electrons
(or atoms) in the crystal. Friedel’s law [Eq. (10.14)] states that the intensities
of the reflection Fhkl and the reflection are equal :
(10.14)
That is, the intensities of the reflections with opposite Miller indices are
equal. Applied to the reflection treatment of diffraction, Friedel’s law simply
states that the reflection intensity off of the front side of a plane and the back
side of the same plane are equal. This assumption is valid when collecting dif-
fraction data from native protein crystals that contain no heavy atoms, and
even heavy-atom-derivatized crystals used in MIR under certain wavelengths
of incident radiation.
Anomalous Dispersion Anomalous dispersion is the phenomenon whereby
certain atoms under certain incident radiation wavelengths absorb some of the
incident radiation, instead of simply scattering the radiation, as in normal dif-
fraction. Typical anomalous scatterers used in protein X-ray crystallography
are heavy atoms such as Hg, Au, and Pb, many of which are also used in MIR.
Anomalous dispersion results in the breaking of Friedel’s law. As a result,
within a single diffraction data set, there are differences between the measured
intensities of reflections that would otherwise be the same in the absence of
anomalous scattering. These intensity differences can be used as in MIR to
determine substructures of the anomalous scatterers in the crystal, which are
then used to phase the diffraction data and determine the structures of the
protein.
I Ihkl hk l=
h h= -( )Fhk l
412 PROTEIN X-RAY CRYSTALLOGRAPHY IN DRUG DISCOVERY
1
ch10.qxd 5/6/05 03:07 PM Page 412
Page 42
hidden
replacement case these two reflections would have the same measured inten-
sity. A single anomalous scatterer-derivatized crystal can thus give two con-
tributions to the Argand diagram, for the reflections Fhkl and .
As in MIR, in anomalous dispersion methods, the positions of the anom-
alous scatterers are calculated from Patterson maps. The expression for an
anomalous Patterson is
(10.15)
Compare this expression with that for the isomorphous difference Patter-
son [Eq. (10.12)]. In the anomalous Patterson expression, |F+ - F-|2 is the
square of the difference in amplitude between the reflection F+ and its Bijvoet
mate F-, the amplitude difference resulting from anomalous dispersion. The
positions of the anomalous scatterers used in phasing the diffraction data are
determined from anomalous Patterson maps (Fig. 10.18). As in the isomor-
phous difference Patterson (Fig. 10.15), the peaks in the anomalous Patterson
correspond to the interatomic vectors between the anomalous scatterers in the
anomalous diffraction data set. Analysis of the anomalous Patterson peaks
can determine the anomalous scatterer-only substructure in the anomalous
diffraction data set. In modern protein crystallography, direct or ab initio
methods, of which the mathematics will not be discussed here, are often used
to determine positions from Patterson maps.
Combined with the data from the native crystal, the phase ambiguity can
be broken using just a single anomalous scatterer derivative (Fig. 10.16). The
two different measured reflections F+ and F- in the anomalous diffraction data
P a F F huu h= ( ) - ( )+ -Â1 22 cos p
Fhk l
414 PROTEIN X-RAY CRYSTALLOGRAPHY IN DRUG DISCOVERY
1
0.000
0.496
Z
0.000 0.492Y
Figure 10.18 An anomalous Patterson map calculated according to Eq. (10.15). The
major peak occurring at approximately Y = 0.33, Z = 0.33 corresponds to the inter-
atomic vector between the anomalous scatterer atoms in the anomalous scatterer-
derivatized diffraction data set.
ch10.qxd 5/6/05 03:07 PM Page 414
Page 43
hidden
set correspond to the reflections FPH1 and FPH2 in Figure 10.16. This method of
using a single anomalous scatterer-derivatized crystal for phasing is called
single isomorphous replacement with anomalous scattering (SIRAS).
Since the differences in measured intensities due to anomalous dispersion
are small, on the order of 3 to 5 percent of the total measured intensity [54],
anomalous dispersion data must be measured and processed with great accu-
racy and care. Diffraction data can be collected from additional derivative
crystals prepared with different anomalous scatterers. The anomalous disper-
sion diffraction data from these additional derivative crystals can then be com-
bined with the native protein and first-derivative data to give additional
phasing information. Such use of multiple anomalous scatterer-derivatized
crystals is called multiple isomorphous replacement with anomalous scatter-
ing (MIRAS).
Protein diffraction data can also be phased by purely anomalous dispersion
data, without the use of diffraction data from an underivatized protein crystal.
One of the earliest widespread uses of purely anomalous dispersion diffrac-
tion data in protein X-ray crystallography was multiple-wavelength anom-
alous dispersion (MAD). In MAD experiments, multiple data sets are
collected from a single crystal derivatized with an anomalous scatterer, with
each data set collected at different wavelengths. These wavelengths are at or
near the wavelength that gives the maximum anomalous dispersion signal for
the anomalous scatterer. At each wavelength, there will be more or less
anomalous signal from the anomalous scatterer, which will result in changes
in measured intensity for a particular reflection between the data sets.
The MAD diffraction experiments are entirely analogous to MIR diffrac-
tion experiments. Whereas in MIR diffraction experiments the wavelength
remains fixed and multiple different heavy atom derivatives are used to collect
data sets with differences in measured intensities, MAD diffraction experi-
ments maintain the anomalous scatterer and use multiple different wave-
lengths to collect data sets with differences in measured intensities. MAD
experiments can be considered as in situ isomorphous replacements where
physics, rather than chemistry, is used to produce the change in scattering
intensity at the site [55]. Once the positions of the anomalous scatterers are
located, usually by Patterson methods, the phases for the reflections are deter-
mined as in the MIR case.
In addition to MAD, there is also single-wavelength anomalous dispersion
(SAD) for phasing protein diffraction data. In SAD, X-ray diffraction data are
collected from a single crystal derivatized with an anomalous scatterer at a
single wavelength. The intensities of the reflections Fhkl and are measured
separately, essentially giving two data sets from a single crystal. As shown in
Figure 10.16a, the Argand diagram analysis of only two diffraction data sets
leads to a phase ambiguity. For a SAD diffraction data set, the two different
amplitudes for the Bijvoet pair of reflections F+ and F- correspond to the
reflections FP and FPH1 in Figure 10.16a. A native diffraction data set collected
from an isomorphous crystal lacking the anomalous scatterer can be used to
Fhk l
X-RAY CRYSTAL STRUCTURE DETERMINATION 415
ch10.qxd 5/6/05 03:07 PM Page 415
Page 44
hidden
break the phase ambiguity, which is the SIRAS technique described above. If,
however, no additional isomorphous data sets are available, the correct phases
can still be derived from purely SAD data.
Modern SAD protein structure determinations use probabilistic methods
to determine initial phases and their reliability [53]. Once a set of phases are
determined for a diffraction data set, an electron density map can be calcu-
lated [Eq. (10.13)]. The electron density maps can then be modified according
to reasonable assumptions. A common electron density modification tech-
nique is solvent flattening. In the crystal, the interstitial regions between the
protein molecules are occupied by solvent, which is usually disordered. This
disordered solvent region should be relatively featureless, compared with the
protein. By iteratively smoothing the electron density in the solvent region,
the electron density features in the protein region will become enhanced until
they become interpretable. This method of iterative electron density modifi-
cation to determine protein structures from pure SAD data is called iterative
single-wavelength anomalous scattering (ISAS) [56]. In general, the steps
involved in determining protein structures from pure SAD data are deter-
mining the positions of the anomalous scatterers, determination of initial
phases, and electron density modification until the electron density maps
become interpretable.
For the crystal structure determination of proteins with unknown folds,
anomalous dispersion techniques such as MAD and SAD are much more
popular than pure isomorphous replacement techniques such as MIR. One of
the major advantages of anomalous dispersion over isomorphous replacement
techniques is that anomalous dispersion data sets can all be collected from a
single crystal. This avoids the problem of nonisomorphism, which can make
MIR data collection and interpretation difficult. In MIR, multiple crystals
derivatized with multiple different heavy atoms are used for data collection.
Each heavy atom and its soaking conditions can have different effects on the
crystal, sometimes causing significant distortions of the crystal unit cell. MIR
data collected from crystals whose unit cells are significantly different from
that of the native protein crystal (nonisomorphous data sets) are of little value
for phasing. With MAD and SAD experiments, using cryo-data collection
techniques, a single crystal derivatized with an anomalous scatterer can be
used to collect all of the diffraction data sets. All of the data sets are
therefore isomorphous since all of the data were collected from a single
crystal.
Molecular Replacement
The isomorphous replacement and anomalous dispersion methods are
required for determination of protein structures of unknown folds. These
methods assume no prior knowledge of the target protein structure and
involve the initial determination of a heavy atom or anomalous scatterer sub-
structure. These substructures are then used to calculate phases for the protein
416 PROTEIN X-RAY CRYSTALLOGRAPHY IN DRUG DISCOVERY
1
ch10.qxd 5/6/05 03:07 PM Page 416
Page 51
hidden
tools for primary structure analysis. These tools allow identification of known
amino acid motifs, such as substrate or cofactor binding motifs (e.g., the
Rossman fold, P-loop motif, iron-sulfur clusters) and protein domains with
known homologous structures that might be represented in the sequence of
interest.
The SCOP ([61]; scop.mrc-lmb.cam.ac.uk/scop/) and CATH ([62];
http://www.biochem.ucl.ac.uk/bsm/cath/) online databases provide classifica-
tion of protein structure with examples of each represented fold. An excellent
resource for learning the basics of protein structure from peptide conforma-
tions to common folds is Introduction to Protein Structure by Carl Branden
and John Tooze (Garland Science, UK). Knowledge of simple common aspects
of tertiary structure such as the common helix-helix knobs-in-holes packing
angles, the left-handed twist of b sheets and the right-handed twist of individ-
ual b strands, and the ways in which loop regions most commonly connect
secondary structures will assist greatly in interpretation of blurry regions
of electron density and recognition of secondary structure elements at low
resolution. Another valuable reference is the classic study by P.N. Lewis,
F.A. Momany, and H.A. Scheraga [63]. This study details the peptide
conformations of the tight turns and chain reversals often encountered at the
surfaces of proteins.
The main objectives of initial model building are to find the correct trace
for the peptide chain and to make the correct sequence assignment for each
residue. Before beginning actual model building, it is advisable to examine a
number of electron density maps calculated to different resolutions and
with different weighing [figure of merit or s(A)] and density modification
schemes (solvent flipping or flattening, histogram matching, or skeletoniza-
tion). Less accurate phases for the highest resolution data might create noise
that makes interpretation of the maps difficult. Initial maps calculated at lower
resolution might reveal the trace of the peptide chain more clearly. The per-
centage of solvent content used in solvent flattening calculations can have a
dramatic impact on the quality of the map, and the best estimate of the real
solvent content will not necessarily produce the most interpretable electron
density maps. Several values above and below the best estimate of the true
value should be tried.
At this early stage density skeletonization, or “bones,” calculations are a
tremendous aid in determining the quality of electron density maps. The
overall connectedness of the electron density can be gauged and secondary
structure elements can often be recognized. Skeletonization calculations
produce traces through the highest electron density points in the map. In well-
phased electron density maps the bones will clearly reveal the trace of the
peptide chain. In a poorly phased map the bones will be fragmented. Some
arbitrary adjustment of the parameters used in the skeletonization calculations
is often required to obtain the best results. If the map has been phased experi-
mentally by isomorphous replacement or anomalous dispersion methods, it
can be helpful to overlay the positions of the phasing sites onto the electron
GENERATION AND ANALYSIS OF STRUCTURAL MODELS 423
ch10.qxd 5/6/05 03:08 PM Page 423
Page 54
hidden
(10.17)
where w(hkl) is a weight, often resolution dependent in macromolecular
refinement, and k is a scaling constant that allows meaningful comparison
between |Fo| and |Fc|. The summation is over all m |Fo|, which constitutes a set
of m equations. Each corresponding |Fc| is a function of the n parameters of
the model, which constitute n unknown parameters. The sum-square error
function can then be represented as a system of equations in matrix form. For
a given set of starting model parameters, the system of equations can be solved
by iterative methods using a truncated Taylor expansion to represent |Fc| as a
function of the n parameters of the model and simplified block-diagonal least-
squares matrices in which all terms that are not highly correlated (such as B
factor and occupancy, if occupancy is being refined) are set to zero. Because
of these approximations, multiple iterations, each corresponding to small shifts
in the values of each model parameter, are required to reach convergence.
Maximum-likelihood methods are statistical and derive from Bayes’s
theorem. Bayes’s theorem is expressed by the equation
(10.18)
In words, Eq. (10.18) states: (the probability of A) times (the probability of
B assuming that A is true) is equal to (the probability of B) times (the
probability of A assuming that B is true). The corresponding expression for
crystallographic refinement is
(10.19)
It is more convenient to introduce Fc as a function of the model parameters
and write
(10.20)
However, the prior probability P(|Fc|) and the normalizing factor P(Fo) are
constants and can be omitted without altering the essence of the relationship.
The remaining terms are the posterior probability P(|Fc|||Fo|) and the likeli-
hood P(|Fo|||Fc|). In the absence of the constants P(|Fc|) and P(Fo) this last term
cannot be properly designated as a probability and is recast as the likelihood
L(|Fo|||Fc|) leaving the expression
(10.21)
The total likelihood is expressed as the joint probability for all Fo:
(10.22)L P F Fhkl hklhkltotal o c= ( )’ , ,
P F F L F Fc o o c( )µ ( )
P F P F F P F P F Fo c o c o c( ) ( ) = ( ) ( )* *
P F P F P P Fo o oel el el( ) ( ) = ( ) ( )* mod mod * mod
P A P B A P B P A B( ) ( ) = ( ) ( )* | * |
Q w F hkl k F hklhklhkl= ( ) - ( )( )Â o c 2
426 PROTEIN X-RAY CRYSTALLOGRAPHY IN DRUG DISCOVERY
1
ch10.qxd 5/6/05 03:08 PM Page 426
Page 56
hidden
In some cases only careful experiments or a high-resolution crystal structure
of the compound will resolve uncertainties. In the absence of such data some
guesses must be made. The standard two-dimensional chemical structure rep-
resentation of the anticancer drug topotecan (Hycamptin) is given in Figure
10.21 as an example. It is not necessarily obvious from the two-dimensional
representation of the structure that N3 of the tertiary amine substituent is sp3
hydridized and could be protonated depending upon the pH, C10 is sp3
hybridized and is not part of the conjugated ring system, or that the pyridone
ring (D) is aromatic. This information was determined partly from standard
chemical knowledge and partly from the crystal structure of a related com-
pound, camptothecin iodoacetate [64].
Hydrogen atoms are usually excluded as part of the refined model unless
the resolution is exceptionally high. Tid is due to their small contribution to
X-ray diffraction (only one electron) and their large contribution to the total
number of parameters in the model. Some refinement packages allow the
model to be refined with riding hydrogens (hydrogens placed in standard posi-
tions with standard bond lengths and angles) added to the model. The hydro-
gens are not refined but do have an effect on refinement, somewhat in Fc, but
mainly in the van der Waals terms, which can lead to improved distributions
of torsion angles.
Further restraints can be added to take advantage of correlated vibrational
properties. Simple B-factor restraints account for the physical reality that the
vibrational amplitudes of covalently bonded atoms can be correlated. A recent
advance in model parameterization has been the introduction of TLS groups
in refinement [65]. TLS groups are substructures of the model (sometimes
entire domains) that have correlated vibrational properties. These correlated
vibrations are described by three tensors that designate the translational (T),
librational (oscillation) (L), and screw (S) components of the motion. TLS
groups account for an entire continuum of conformations of part of the model
in a parsimonious manner with respect to the number of parameters required.
428 PROTEIN X-RAY CRYSTALLOGRAPHY IN DRUG DISCOVERY
1
Figure 10.21 Standard two-dimensional chemical structure representation of
topotecan.
ch10.qxd 5/6/05 03:08 PM Page 428
Page 59
hidden
The Rfactor gives a normalized sum-error between the observed and calcu-
lated structure factors and is often expressed as a percentage. As the model
comes to give a better representation of the observed data, k|Fc| will approach
the value of |Fo| and the numerator in Eq. (10.25) will decrease. Thus the
overall Rfactor will decrease as refinement progresses. The Rfactor is the most com-
monly sited metric of correctness of fit for refinement, but many others includ-
ing correlation coefficients, figures of merit, s(A), and some variants of the
Rfactor are also often calculated and output by refinement software. Both least-
squares and maximum-likelihood methods allow estimations of the standard
uncertainties of the xyz coordinates, occupancies, and B factors. Lower stan-
dard uncertainties should correlate with improvements in Rfactor and the other
metrics. All refinement programs will output a list of rms deviations from
ideality for all restraint terms. These data are a crucial guide for achieving a
properly refined model and for judging the overall X-ray vs. geometry weight.
In addition to the overall statistics for restraints, refinement programs will
also list specific bonds, angles, torsions, and the like that deviate significantly
(greater than three standard deviations at least) from the ideal values. These
outliers often indicate regions of the model that require further interactive
adjustment before refinement will converge.
In addition to judging the progress of refinement toward the best-fit model
for the data, the refinement results must be used to gauge the appropriateness
of the parameterization of the model. If the observation/parameter ratio is
low, it is possible to overfit the model and produce a refined structure that
appears to fit the data much better than it actually does. This is analogous to
fitting a two-exponential decay with a three-exponential equation: The fit
might very well be better, but the model parameters obtained will not repre-
sent the true nature of the data. An obvious case of this problem would be
refinement with anisotropic B factors against data to no better than about
2.5Å. The extra parameters will allow an apparent better fit to the data, but
they will not represent true individual atom anisotropic thermal motion
because it is not properly represented in the data.
In most cases the choice of model parameterization requires more subtle
judgment, especially for restraints and weights. The free Rfactor (Rfree) was intro-
duced as a means of validating the choice of model parameterization. Rfree is
a standard Rfactor, but it is calculated against a small subset of the data (5 to 10
percent) that is set aside and not used in model refinement. The agreement
between these data and the model, therefore, provides an unbiased gauge of
the appropriateness of the model parameterization. If the model parameters
are appropriate, the Rfactor and Rfree will both decrease as refinement progresses
and the agreement between the model and the working data is improved. If
the data are being overfit the Rfree remains static or increases even though the
Rfactor continues to decrease. The standard Rfree is a limited application of the
notion of the jackknife test in which data are binned randomly and the model
fit many times. Each time the model is fit with a different bin omitted and the
result tested for agreement against the omitted data. This prevents the inter-
GENERATION AND ANALYSIS OF STRUCTURAL MODELS 431
ch10.qxd 5/6/05 03:08 PM Page 431
Page 63
hidden
Structure-based drug design
Choose drug target
Obtain pure preparation of target in solution
Determine structure by crystallography or NMR
Analyze structure to determine possible inhibitor
binding sites
Dock and score compounds from database against
target’s selected sites
Analyze ranked list of scored compounds and
optimize top pick for binding and selectivity
Purchase or synthesize lead and test for binding in
biochemical assays
Is lead a micromolar
inhibitor in solution?
Determine structure of target and lead using NMR
or XRC
Analyze structure of target and lead for interactions
Is lead a nM inhibitor?
Yes
Yes
Make lead bioavailable and test for potency
Clinical trials
Commercial drug
No
Homology modeling; use
known similar structure and
modify sequence for desired
target
Pick next lead
in list.
Analyze and
opximize
No
Yes
Can lead be
modified and
optimized?
Modify and
optimize lead
in silico
No
Figure 10.22 Schematic work flow of structure-based drug design. (Figure taken from
Anderson [4].)
435
ch10.qxd 5/6/05 03:08 PM Page 435
Page 65
hidden
and the SBDD cycle is repeated until an acceptable compound is discovered
or the project is canceled.
Arguably the first marketed drug whose generation was significantly
impacted by X-ray crystallography is the angiotensin-converting enzyme
(ACE) inhibitor Captopril from Bristol Myers Squibb. Curiously though,
structures of the actual target apo-ACE or those of ACE in complex with
inhibitors were not available. Instead, in the 1970s investigators used the X-
ray crystallographic structure of the closely related zinc protease bovine car-
boxypeptidase A in complex with benzylsuccinate to create a homology model
of ACE. Using this homology model and the structure of the inhibitor complex
investigators rationalized the synthesis of new drug compounds, eventually
creating Captopril (Fig. 10.23), a highly selective compound with tight binding
affinity and low side effects. Captopril was approved by the Food and Drug
Administration (FDA) and marketed in 1981. Today, Captopril has undergone
the usual lifecycle of approved drugs and is available as a generic.
The first successful drug to reach clinical use that was based on an SBDD
development cycle is claimed by Merck Pharmaceuticals. Trusopt (generic dor-
zolamide) is an inhibitor of carbonic anhydrase II. The discovery of Trusopt
was stimulated by X-ray crystallographic structures of carbonic anhydrase II
in complex with acetazolamide and other sulfonamides. Rational design orig-
inating from the zolamide scaffold assisted in the discovery of inhibitor com-
pounds with activities increased by three orders of magnitude (Fig. 10.24).
EXAMPLES FOR THE USE OF X-RAY CRYSTALLOGRAPHY 437
N
H
N
H
O–
R
O
O
Zn2+
H2N N
H
NH2+
HO
O–
O
O
Zn2+
H2N N
H
NH2+
HOOC N
COOHO
HS N
COOHO
Substrate
{
Inhibitor
IC50 = 630 mM IC50 = 23 nM
Captopril
Carboxypeptidase A
CH3
Figure 10.23 X-ray crystallographic data of carboxypeptidase A shows that a guani-
dinium group in an arginine residue and a zinc ion are crucial for complexing the
natural substrate (top left) and the inhibitor compound benzylsuccinate (top right).
The optimization of the first ACE lead N-succinoyl-prolin (bottom left) to Captopril
(bottom right) was aided by rationalization of the binding interaction in the homolo-
gous enzyme. (Taken with permission from Kubinyi [72]).
ch10.qxd 5/6/05 03:08 PM Page 437
Page 66
hidden
HIV Protease Inhibitors The most profound impact that X-ray crystallog-
raphy has had on current clinical practice and public health was the discovery
of the human immune virus (HIV) protease inhibitors. HIV protease is a
crucial enzyme important for replication of HIV. The HIV protease processes
two of the three gene products of HIV into active enzymes and is required for
viral reproduction. Inhibition of the enzyme can prevent HIV replication.
Pharmaceutical companies had a strong background in the development of
protease inhibitors, as many therapeutic targets are also proteases. The first X-
ray crystallographic structure of HIV protease was published in 1989 and,
within 8 years four separate compounds from four different companies, each
developed using SBDD, were approved by the FDA for HIV treatment. The
convergence of several unique circumstances aided this remarkable achieve-
ment: The expertise in pharmaceutical companies working on aspartyl pro-
teases coupled with an intense public interest and generous financial support
by the U.S. government provided the extraordinary context. Crucially, HIV
protease is an enzyme amenable to facile co-crystallization with drug com-
pounds, and the availability of appropriate structural information at a timely
point in several drug discovery projects allowed a concentrated SBDD focus.
438 PROTEIN X-RAY CRYSTALLOGRAPHY IN DRUG DISCOVERY
1
S
NH2
O O
Ki = 300 nM H3C N
H
S S
NH2
O N N
O O
S S
HN
CH3
S
NH2
O O
S S
NH2
O O
O O
H3C
Acetazolamide Dorzolamide
Ki ¢ = 0.37 nM
Figure 10.24 Carbonic anhydrase II inhibitors. Structures of carbonic anydrase in
complex with moderately active compounds led to the discovery of the highly active
compound dorzolamide.

Figure 10.25 Pathway of the discovery of Mozenavir. Dupont’s initial HIV protease
inhibitors (compound 7) resulted from a computer-based three-dimensional search of
known chemical entities using a pharmacophore hypothesis (top left). Scaffold entities
8, 9, and 10 were used as a base compound to add functional groups on the P1 and P2
pseudosymmetric compounds. The lower half of the panel shows the final compound
DMP-450 (Mozenavir) with distances in Å to protein atoms of the two molecules of
HIV protease in the crystal structure.
ch10.qxd 5/6/05 03:08 PM Page 438
Page 67
hidden
OOH
P1¢ P1
¢
P1
N
O
N
NH
HN
OH
HN
NH
HO
OH
N
H
O
N
H
N
H
HN NH
O
P1
HO OH
O
H2C
O
CH2
OH
HO
OH
N N
O
HO OH
P2 P2¢
P1 P1¢
7
(DuPont Merck)
8.5–12.0Å
P1 P1¢>
>
<
>>
>
H-bond
donor/acceptor
3.5–6.5Å
8 9 10
Gly-48
3.0
3.5
IIa-50 IIa-50¢
3.0 3.1
N
H
O
Gly-48¢
3.1
4.0
H
N
O O
O
H
N
O–
O
O
3.5
2.0
Asp-30 Asp-30¢
OH
O
O
O
3.4
2.9
2.9
2.9
Asp-25 Asp-25¢11
(DuPont Merck)
Ki = 0.018 nM
3.2
3.4
439
ch10.qxd 5/6/05 03:08 PM Page 439
Page 68
hidden
The discovery path of Mozenavir, an HIV protease inhibitor, nicely illus-
trates the SBDD process. Dupont began its HIV protease investigation using
a pharmacophore hypothesis based on the X-ray crystallographic structure
of the apo-enzyme HIV protease. This pharmacophore consisted of two
lipophilic groups separated by 8.5 to 12Å and coupled with one hydrogen
bond donor/acceptors at a distance of 3.5 to 6.5Å (Fig. 10.25). Extensive
searches of the Cambridge Structural Database, composed of three-
dimensional structures of small-molecule compounds, revealed a potential
lead compound. The initial lead was pseudosymmetrical, with the two ends of
the molecule being virtual mirror images of each other. This property was sig-
nificant because it was known that HIV protease functions as an obligatory
dimer. From the crystal structure it was known that the interface of the two
proteins is composed of identical residues from each protease molecule, sug-
gesting that inhibitor compounds may contain symmetrical moieties. Co-
crystal structures with initial lead compounds revealed an important feature
of the potential inhibitor binding site. One crucial observation was the real-
ization that the methoxy groups of compound 7 (Fig. 10.25) could replace an
ordered water molecule. Different scaffold compounds were then chosen to
add methoxy- as well as other functional groups. Iterative SBDD cycles con-
tinued until several molecules were identified that had nanomolar affinities
and favorable pharmacokinetic properties. Clinical development of the final
compound, DMP450 (Mozenavir) was turned over to Triangle Pharmaceuti-
cals and stopped in 2002 due to Mozenavir’s side effects [70].
Protein Kinases Protein kinases are therapeutic targets for a variety of dis-
eases. More than 500 kinases have been identified in the human genome, they
are enzymes dependent on adenosine 5¢-triphosphate (ATP) that phosphory-
late other proteins. Most kinases act in cell-signaling pathways, phosphorylat-
ing other signaling proteins whose activity is then either turned on or turned
off as a result of the attached negatively charged phosphate group. The cat-
alytic domains of most kinases are structurally conserved; however, their
mechanisms of regulation are distinctly different. The catalytic domains are
composed of a bi-lobed structure consisting of a helical domain and a b-strand
domain (Fig. 10.26). ATP is bound at the interface of the two domains, and
this binding site has been deemed an attractive site for designing ATP com-
petitive inhibitors that block ATP binding, thus preventing phosphorylation.
Unfortunately, the ATP binding site is highly conserved in most kinases, thus
rational design faces serious challenges in creating inhibitors selective for a
specific kinase with little inhibition of other closely related kinases. SBDD of
kinase inhibitors is popular because of the low experimental barriers. Con-
served and unique residues can be identified in the inhibitor binding site and
rational design attempts to maximize interactions with unique residues in the
inhibitor binding site while minimizing interactions with conserved residues.
The underlying hypothesis is that specificity can be created as interactions with
the inhibitor are increased with residues specific for the kinase in question
while cross-reactivity decreases for other kinases.
440 PROTEIN X-RAY CRYSTALLOGRAPHY IN DRUG DISCOVERY
1
ch10.qxd 5/6/05 03:08 PM Page 440
Page 69
hidden
Antistructures
As discussed, the main applications of X-ray crystallography in drug discov-
ery and optimization projects are based on the analysis of the target and
the interactions of targets with their ligands. The design efforts are aimed at
strengthening the resulting complexes. The opposite approach, however, weak-
ening the interaction of drug leads with some proteins may be used to one’s
advantage. The interacting proteins are not the actual targets but those that
cause detrimental effects in drug efficacy. Such proteins are sometimes called
antitargets and may be related enzymes with similar substrate binding pockets
but with very different function, such as kinases or phosphatases.
Generally, weak binding of small-molecule drugs to serum albumin and
detoxification proteins such as cytochrome P450 is a desired property. P450
enzymes are involved in the oxidative metabolization of most drugs and are
often the source of drug-related side effects or their toxicity. Several P450
structures are available [73] and may be used for in silico docking studies and
the published crystallization methods may be used to grow crystals for soaking
or co-crystallization studies. The goal of such projects is to increase lead effi-
cacy by defining lead modifications that weaken the interaction with anti-
EXAMPLES FOR THE USE OF X-RAY CRYSTALLOGRAPHY 441
N-terminal lobe
Hinge
Thr315
Phe382
Activation
segment
C-terminal lobe
Figure 10.26 Overview of the structure of a kinase with an inhibitor bound. Kinases
offer the possibility to design inhibitors based on stabilization of inactive conformations.
ch10.qxd 5/6/05 03:08 PM Page 441
Page 70
hidden
target proteins while maintaining the affinity to the target protein. It would
be highly desirable to extend further this negative template design, that is, by
including membrane-bound drug transporters, to modulate, based on molecu-
lar structural insight, the critical processes of absorption, distribution, metab-
olism, excretion, and toxicity. These possible applications clearly demonstrate
that X-ray crystallography is headed toward effecting later stage discovery and
early stage drug development projects.
Human serum albumins consist of three domains with six rather promiscu-
ous ligand binding sites. Many drug leads bind to this protein causing a serious
problem for lead discovery. In an instructive example, the feasibility of its
“design away” approach was demonstrated for diflunisal, a nonsteroidal anti-
inflammatory cyclooxygenase inhibitor. Note that 99 percent of diflunisal in
serum is unavailable to the target due to binding to human serum albumin.
This requires high doses of up to 250mg diflunisal to be administered, causing
gastrointestinal irritation as a serious side effect. In a structure-based drug
design effort diflunisal analogs were synthesized that were deemed to bind less
efficiently to HSA-III (a human serum albumin subdomain). Several com-
pounds were generated that exhibited more than 100-fold reduced binding to
HSA-III (with only 10-fold reduction in affinity for full-length albumin). Sig-
nificantly, several of these compounds maintained their activity against the
actual target, cyclooxygenase-2 [74].
Protein Therapeutics
Protein therapeutics such as EPO and insulin are the hallmark of this fast-
growing class of drugs. Hardly any new optimization program excludes the use
of X-ray crystallography. Based on natural products, protein therapeutics are
directly amenable to X-ray crystallographic investigation and subsequent
redesign. Classes of protein or peptide therapeutics are monoclonal antibod-
ies, cytokines, enzymes, and viral fusion inhibitors. In a landmark study Ewert
et al. [75] used X-ray crystallographic structure-based antibody engineering to
aid in the identification of residues that improve unsatisfactory antibody prop-
erties [75]. In a different case, an HIV entry inhibitor was designed based on
CD4. The 27-amino-acid CD4 mimic interacted with gp120 and is bound to
HIV particles with CD4-like affinity. This mini-CD4 is a prototype HIV-1
inhibitor and a potential component for vaccine formulations [76].
In silico Screening Based on Crystallographic Structural Models
X-ray crystallographic structures of proteins may be used to preselect a small
number of compounds from compound libraries. Different types of computer-
based algorithms such as docking [77] may be employed to predict the for-
mation of ligand–protein complexes with single compounds or with entire
compound libraries (“virtual screening” or “in silico screening”). This compu-
tational approach is particularly advantageous for targets where target struc-
tural information is readily available at no cost (i.e., in the public domain). A
442 PROTEIN X-RAY CRYSTALLOGRAPHY IN DRUG DISCOVERY
1
ch10.qxd 5/6/05 03:08 PM Page 442
Page 71
hidden
side-by-side comparison of assay-based high-throughput screening (HTS) with
such virtual screening on the same target protein has been described by
Doman et al. [5]. Numerous structures of PTP1B (tyrosine phosphatase 1 B),
a diabetes type II drug target, are available from the Protein Data Bank. Mol-
ecular docking of ca. 235,000 compounds into the closed, ligand-removed
structure of PTP1B yielded 365 high-scoring candidates that were tested for
inhibition in enzyme-based assays. Of these candidates 34.8 percent inhibited
PTP1B with IC50 values below 100 mM, representing an enrichment of 1700-
fold (as compared to random screening). Conventional high-throughput
screening on the other hand, yielded only 85 hits with IC50 values below
100mM out of 400,000 tested molecules (corresponding to a 0.021 percent hit
rate). Interestingly, the hit lists were rather different and the hits generated by
molecular docking appeared more druglike than the HTS hits, suggesting that
the two screening techniques may be used in a complementary way. Indeed,
the integration of virtual and high-throughput screening is judged to be a
promising approach in modern lead discovery projects [78].
Crystallographic Screening
X-ray crystallographic screening is a modern combination of lead identifica-
tion, immediate X-ray crystallographic structural evaluation, and subsequent
lead optimization on crystals that are soaked with ligand mixtures. Ingeniously,
the binding capability of some proteins in crystals is employed to “fish” and
present tight binders. Those ligands with the highest affinity are identified and
selected for further rounds of optimiziation [79]. Compared to conventional
high-throughput screening, crystallographic screening (a) yields hits with
activities in the mM to 30 mM range, (b) yields hits with evidently defined
binding interactions, and (c) involves only minimal hit-to-lead synthetic chem-
istry efforts because follow-up libraries can be focused using detailed infor-
mation from the crystal structure. Nienaber et al. [1] demonstrated that
crystallographic screening can be performed in a rapid, efficient, and high-
throughput fashion. They established the utility of the iterative process by
discovering 8-aminopyrimidyl-2-aminoquinoline, a new class of anticancer
urokinase inhibitors (Fig. 10.27). Initially, 61 compounds were divided into 9
separate mixtures with 6 to 8 compounds each. Care was taken to distribute
into a particular cocktail those compounds that had the greatest degree of
structural diversity in order to facilitate subsequent ligand identification based
on the shape of Fo-Fc electron density maps. Nine urokinase crystals were
soaked with individual cocktails, X-ray diffraction data was generated, and the
resulting electron density maps were inspected. They showed the shape and
orientation of compounds (Fig. 10.27 a). In one case two binders were present.
Here, the removal of the prominent compound and resoaking allowed to iden-
tify the electron density of the second ligand. In the subsequent lead opti-
mization step previous structure–activity relation ship (SAR) data was
included and lead to the development of 8-aminopyrimidyl-2-aminoquinoline,
EXAMPLES FOR THE USE OF X-RAY CRYSTALLOGRAPHY 443
ch10.qxd 5/6/05 03:08 PM Page 443
Page 72
hidden
a ligand with a ca. 100-fold increased inhibitor potency (Ki = 0.37 mM) and a
38 percent oral availability, as determined by in vivo pharmacokinetic tests.
This type of process is capable of identifying weaker binding ligands (1mM)
and is applicable where apo-crystals are available and tolerate soaking. Crys-
tallographic screening may also be used to facilitate the validation of new
targets, the development of assays and assist in assigning biochemical function
to orphan targets.
Crystallographic Fragment Screening
A variation of this theme is crystallographic fragment screening. Here crystals
are soaked with cocktails that contain small druglike fragments rather than
complete leadlike compounds. Once several fragments are identified crystal-
444 PROTEIN X-RAY CRYSTALLOGRAPHY IN DRUG DISCOVERY
1
Lead Identification by
crystallographic
screening
4 5
Ki>500mM
Ki 200mM
>500mM 71mM
Lead optimization
Structure-based
design
Non-orally available
naphthamidine inhibitor
Optimized lead
Ki=0.37mM
38% Orally available
2-aminoquinoline inhibitor
56mM
A
B
C
Br
N
HN
N
NH2
NH2
OH
N
N
N
NH2
H2N
NH
NH
N
N
N
NH2
NH
NH2
N
NH2
S195
D189
S1b
S1b
Figure 10.27 Urokinase lead identification via crystallographic screening and opti-
mization [1]. (a) Initial Fo–Fc electron density maps for ligands that were identified
from compound cocktail-soaked urokinase crystals. (b) Crystal structures of 8-
aminopyrimidyl-2-naphtamidine (orange) and a 2-aminoquinoline lead (blue). (c)
Structure and 2Fo–Fc electron density map for the optimized lead compound 8-
aminopyrimidyl-2-aminoquinoline.
ch10.qxd 5/6/05 03:08 PM Page 444
Page 73
hidden
lographically, they can be developed into new lead compounds (Fig. 10.28).
Curiously, low-affinity small fragments that bind adjacent binding pockets can
be joined and result in a larger molecule with increased affinity. Typically frag-
ment libraries consisting of only a few hundred to a thousand compounds are
screened. Crystallographic fragment screening is a new and promising tech-
nology employed by several biotechnology companies; however, specific
examples for drug discovery have not yet been published in the scientific
literature.
Rees et al.[81] discuss 25 examples for the successful application of the frag-
ment-based lead discovery approach, some of them aided by crystallographic
screening. They also formulate a “Rule of three” in which the average frag-
ment is characterized as (a) having a mass of less than 300Da, (b) having less
than or equal to 3 hydrogen bond donors, (c) having less than or equal to 3
hydrogen bond acceptors, and (d) having a c log P of 3. In addition, the number
of rotatable bonds was on average less than or equal to 3 and the polar surface
area was about 60Å2.
Site-Directed Leads via Fragment Tethering
An additional layer of complexity is added by generating site-directed leads
via fragment tethering ([2]; Fig. 10.29). In a first step target proteins are cova-
lently modified at a particular site on the surface. Mass-spectrometric detec-
tion allows the identification of weakly binding ligand precursors. In a second
EXAMPLES FOR THE USE OF X-RAY CRYSTALLOGRAPHY 445
Figure 10.28 Schematic crystallographic fragment screening. Once fragments are
identified (a, b) they can be joined (c) resulting in a leadlike compound or fragments
may be developed along the lines of conventional structure-based drug design. (Figure
taken from Jothi, [80].)
ch10.qxd 5/6/05 03:08 PM Page 445
Page 74
hidden
step X-ray crystallography provides the tool to observe the interaction of the
precursor with the protein and helps directing the chemical synthesis of fused
analogs with potentiated affinity. This strategy was used to generate a potent
inhibitor for the anticancer target thymidylate synthase [3]. Although thymidi-
late synthase contains an active site cystein, this reactive group can be intro-
duced by surface mutatenesis (native amino acid to Cys). A library of 1200
disulfide-containing compounds was screened in pools of up to 100 com-
pounds. Several disulfide adducts were detected by mass spectrometry. One
of the selected compounds was investigated further: The inhibition constant
Ki of the tether-free analog N-tosyl-d-proline was 1.1mM. Subsequent crys-
tallographic structure determination of the tymidilate synthase adduct and
lead optimization improved the affinity over 3000-fold. This approach has
been refined to extended tethereing where the first identified tethered com-
pound serves as an anchor for the next fragment [82]. The tethering approach
enables the nucleation of drug design efforts at specific sites on protein
surfaces.
Structural Genomics
Current genome-wide structural genomics programs aim to determine repre-
sentative structures for all proteins. The goal is to infer biological function
from similar structures of known function [83]. Once representative structures
are obtained, structural homology models of all members of each protein
family can be built using these templates. This information is useful in select-
ing and classifying drug targets because the crystallographic structure holds
information regarding protein function. The assignment of function by com-
parison on DNA (deoxyribonucleic acid) level fails for some proteins that
have different sequence but similar fold. Furthermore, structures of proteins
with bound natural ligands may expose their mechanism of action and allow
researchers to act on this insight. In a proof-of-principle experiment
446 PROTEIN X-RAY CRYSTALLOGRAPHY IN DRUG DISCOVERY
1
Introduce
Cysteine Residue
SH
Screen Against Library
of Disulfide-Containing
Small Molecules
Remove Disulfide
from Selected
Molecule
R
SS R
S
SS
SR
SS
Figure 10.29 Schematic illustration of the fragment tethering approach.
ch10.qxd 5/6/05 03:08 PM Page 446
Page 75
hidden
Zarembinski et al. [84] demonstrated that the function of a gene can indeed
be assigned by X-ray crystallographic structure determination. They reported
the 1.7-Å resolution structure of MJ0577, a protein form the hyperthermophile
Methanococcus jannaschii. Unexpectedly, a bound ATP molecule was identi-
fied from the electron density map. Therefore, the protein was deduced to
work as an ATPase or an ATP-mediated molecular switch. The latter function
was confirmed by subsequent biochemical experiments. Similarly, a thymidi-
late synthase complementing protein was discovered from a structural
genomics project by structure-based functional analysis, leading to its classifi-
cation as an antibacterial drug target [85].
At first sight, this approach may seem to be a digression, but it addresses a
grave problem in protein X-ray crystallography on new targets: There is no
guarantee for obtaining a structural result. Current success rates for generat-
ing structures of novel protein targets are below 50 percent. However, once
the scope of targets is increased, for example, including homologous proteins
from various organisms, the probability of successful generation of X-ray crys-
tallographic structures increases dramatically.
The number of potential drug targets has increased from about 500 in the
mid-1990s to several thousand possible targets these days. It is the goal of
structural genomics programs to provide structures of these new drug targets,
even those of not yet identified targets or those proteins that may be used in
the future for expedient homology model-building purposes once homologous
proteins have been identified as useful drug targets. The formation of the
Structural Genomics Consortium and its financial support by Glaxo-Smith-
Kline pays tribute to the value of this approach. That consortium’s goal is to
determine 350 X-ray crystallographic structures of proteins directly related to
human health, including proteins associated with cancer, neurological, and
infectious diseases, within 3 years, starting from 2003.
The abundance of protein structural information has fueled large-scale
bioinformatics approaches, aiding the drug discovery process at many stages.
One particular application could be the improvement of the final drug product
profile by systematic biasing drug lead optimization away from all known
structures by treating their entirety as antitargets.
10.7 LIMITATIONS AND CHALLENGES OF X-RAY
CRYSTALLOGRAPHY IN THE DRUG DISCOVERY PROCESS
Despite the outstanding track record of X-ray crystallography as an excellent
tool for drug discovery purpose, its use has several limitations. The most
serious ones result from (a) problems arising from protein flexibility, (b) short-
comings in computational methods to use structural information for the
proper quantification of the energetics of protein–ligand interactions, and (c)
difficulties in applying the crystallographic method to challenging targets such
as membrane proteins.
LIMITATIONS AND CHALLENGES OF X-RAY CRYSTALLOGRAPHY 447
ch10.qxd 5/6/05 03:08 PM Page 447
Page 77
hidden
stabilizes a particular conformational state of the pool of many low-energy
states that proteins can exist in at thermal equilibrium. In the process of SBDD
this causes a high degree of unpredictability, making the method less useful.
However, some proteins are less flexible, and their conformation hardly
changes when ligands bind. These are the targets that are particularly suscep-
tible to conventional SBDD efforts. There are only a few solutions to this fun-
damental predicament [89], notably the computationally intensive approach
to treat the protein as a flexible entity and the tethering discovery approach.
Understanding this limitation, however, may serve as the best antidote against
the overuse of this tool.
Sanders et al. [90] point out a serious shortcoming of crystallographic
screening. They described the discovery of competitive inhibitors for dihy-
droneopterin aldolase via crystallographic screening and demonstrated that
several compounds with IC50 around 1mM were negative in crystal soaking
experiments. Apparently the conformational shift associated with the binding
of these missed compounds did not allow association to the protein in the pre-
formed crystal.
The deficiencies of current computational methods to properly quantify the
interactions of proteins with ligands is one of the consequences of molecular
flexibility. But even more fundamentally, our current understanding of the
energetics of ligand–protein interaction and hence their proper quantification
by scoring functions is limited [91]. A weak point remains, for instance, in the
description of entropic terms for binding interactions, although progress is
being made and an energetic penalty of 10 to 30 kcal/mol is estimated for
protein reorganization due to binding [92].
LIMITATIONS AND CHALLENGES OF X-RAY CRYSTALLOGRAPHY 449
H2O-237
H2O-236
H2O-247
Gly-139
Lys-138
Thr-137
Gly-140
Gln-141
Figure 10.31 Structural heterogeneity in human interleukin1b. The ensemble of
models displays considerable backbone variability, disordered side chains, and multi-
ple locations of water molecules. The models were obtained from the same set of 2.3-
Å resolution X-ray diffraction data, and refined to similar levels. (Image taken from
DePristo et al. [87].)
ch10.qxd 5/6/05 03:08 PM Page 449
Page 78
hidden
Finally, due to major technical bottlenecks X-ray crystallography cannot be
applied to all drug discovery programs. The reason for this is the high uncer-
tainty involved in obtaining any useful results with a given allocation of
resources. The uncertainty is caused by the many risk-fraught steps involved
in the crystallographic endeavor. Modern high-throughput crystallographic
technologies seek to reduce this risk and have in some cases provided de novo
structural information within less than 2 months at reasonable cost. This is not
a given, though, since the outcome of a particular X-ray crystallographic
project is unpredictable. This weakness reduces the applicability of X-ray crys-
tallography to about half of all discovery projects in major pharmaceutical
companies.
At the time of this writing, almost 30,000 protein structures were available
from the PDB, an impressive record of the methodology’s performance and
significance. However, many structures for the presently ca. 500 proteins tar-
geted by current drugs are not available because they have not been crystal-
lized and their crystallographic structures have not been determined. This is
particularly bothersome for membrane proteins, the latter of which represent
half of the proteins that are targeted by today’s marketed drugs [19]. Some 45
percent of molecular targets of known drugs are G-protein-coupled receptors
(GPCR), proteins that have been notoriously difficult to deal with in the stages
of overexpression, purification, and crystallization. Indeed, to date only one
structure of such a protein has been determined, that of bovine rhodopsin, and
several structures of similar bacterial homologs with seven transmembrane
helices. It is anticipated that committed efforts in crystallographic structure
determination projects can surmount the experimental barriers and yield
structural information of this highly important protein class. The current state
of SBDD on such valuable GPCR targets is reminiscent of the situation in the
early 1970s when homology models based on carboxypeptidase A were suc-
cessfully applied in the “design” of ACE inhibitors. Indeed, the first “virtually”
discovered compound targeting a predicted structure of the serotonin recep-
tor (a GPCR) entered phase 1 trials in 2004 and, Abbott’s drug Atrasentan,
targeting the endothelin-A receptor was optimized with the help of a homol-
ogy model of the endothelin-A receptor (also a GPCR; [93]).
Acknowledgments We are greatly indebted to Dr. Wendy Sanderson and Dr.
Ehmke Pohl for thoroughly reading and revising parts of this chapter.
REFERENCES
1. Nienaber, V. L., Richardson, P. L., Klighofer, V., Bouska, J. J., Giranda, V. L., Greer,
J. (2000). Discovering novel ligands for macromolecules using X-ray crystallo-
graphic screening. Nat. Biotechnol., 18, 1105–1108.
2. Erlanson, D., Wells, J., Braisted, A. (2004). Tethering: Fragment-based drug discov-
ery. Annu. Rev. Biophys. Biomol. Struct., 33, 199–223.
450 PROTEIN X-RAY CRYSTALLOGRAPHY IN DRUG DISCOVERY
1
ch10.qxd 5/6/05 03:08 PM Page 450
Page 79
hidden
3. Erlanson, D. A., Braisted,A. C., Raphael, D. R., Randal, M., Stroud, R. M., Gordon,
E. M., Wells, J. A. (2000). Site-directed ligand discovery. PNAS, 19, 9367–9372.
4. Anderson,A. C. (2003). The process of structure-based drug discovery. Chem. Biol.,
10, 787–797.
5. Doman, T. N., McGovern, S. L., Witherbee, B. J., Kasten, T. P., Kurumbail, R.,
Stallings, W. C., Conolly, D. T., Shoichet, B. K. (2002). Molecular docking and high-
throughput screening for novel inhibitors of protein tyrosine phosphatase-1B.
J. Med. Chem., 45, 2213–2221.
6. Bergfors,T. M. (1999). Protein Crystallization,Techniques, Strategies and Tips; a Lab
Manual. International University Line, La Jolla, CA.
7. McPherson, A. (1989). Preparation and Analysis of Protein Crystals. Krieger
Publishing.
8. McPherson, A. (1998). Crystallization of Biological Macromolecules. Cold Spring
Harbor Laboratory Press, Cold Spring Harbor, NY.
9. Ducruix, A., Giegé, G. (1992). Crystallization of Nucleic Acids and Proteins, a Prac-
tical Approach. IRL Press.
10. Weber, P. C. (1991). Physical principles of protein crystallizaiton. Adv. Protein
Chem., 41, 1–36.
11. Durbin, S. D., Feher, G. (1996). Protein crystallization. Annu. Rev. Phys. Chem., 47,
171–204.
12. McPherson, A. (1990). Current approaches to macromolecular crystallization. Eur.
J. Biochem., 189, 1–23.
13. Chernov, A. A. (2003). Protein crystals and their growth. J. Struct. Biol., 142, 3–21.
14. Drenth, J., Haas, C. (1992). Protein crystals and their stability. J. Cryst. Growth, 122,
107–109.
15. Chayen, N. E. (1998). Comparative studies of protein crystallizaiton by vapour-
diffusion and microbatch technique. Acta Crystallogr., D54, 8–15.
16. Chayen, N. E., Shaw Stewart, P. D., Blow, D. M. (1992). Microbatch crystallization
under oil—A new technique allowing many small-volume crystallization trials.
J. Cryst. Growth, 122, 176–180.
17. Chayen N. E. (1997). The role of oil in macromolecular crystallization. Structure,
5, 1269–1274.
18. Zeppezauer, M. (1971). In W. B. Jakoby (Ed.), Methods in Enzymology, Vol 22.
Academic, New York and London.
19. Drews, J. (2000). Drug discovery: A historical perspective. Science, 287, 1960–1964.
20. Iwata, S., (Ed.) (2003). Methods and Results in Crystallization of Membrane Pro-
teins. International University Line, Biotechnology Series.
21. Hunte, C. C., von Jagow, G., Schagger, H (2003). Membrane Protein Purification
and Crystallization: A Practical Guide. Academic, San Diego.
22. Nollert, P., Navarro, J., Landau, E. M. (2001). Crystallization of membrane proteins
in cubo. Methods Enzymol., 343, 183–199.
23. Terpe, K. (2003). Overview of tag protein fusions: From molecular and biochemi-
cal fundamentals to commercial systems. Appl. Microbiol. Biotechnol., 60, 523–533.
24. Smyth, D. R., Mroziewicz, M. K., McGrath,W. J., Listwan, P., Kobe, B. (2003). Crystal
structures of fusion proteins with large-affinity tags. Protein Sci., 12, 1313–1322.
REFERENCES 451
ch10.qxd 5/6/05 03:08 PM Page 451
Page 83
hidden
87. De Pristo, M. A., de Bakker, P. I. W., Blundell,T. L. (2004). Heterogeneity and inac-
curacy in protein structures solved by X-Ray crystallograph. Structure, 12, 831–838.
88. Davis, A., Teague, S. (1999). Hydrogen bonding, hydrophobic interactions, and
failure of the rigid receptor hypothesis. Angew. Chem. Int. Ed. Engl,. 38, 736–749.
89. Teague, S. J. (2003). Implications of protein flexibility for drug discovery. Nat. Rev.
Drug Discov., 2, 527–541.
90. Sanders,W. J., Nienaber,V. L., Lerner, C. G., McCall, J. O., Merrick, S. M., Swanson,
S. J., Harlan, J. E., Stoll,V. S., Stamper, G. F., Betz, S. F., Condroski, K. R., Meadows,
R. P., Severin, J. M., Walter, K. A., Magdalinos, P., Jakob, C. G., Wagner, R., Beutel,
B. A. (2004). Discovery of potent inhibitors of dihydroneopterin aldolase using
CrystaLEAD high-throughput X-ray crystallographic screening and structure-
directed lead optimization. J. Med. Chem., 47, 1709–1718.
91. Reddy, M. R., Erion, M. D. (Eds.) (2004). Free Energy Calculations in Rational
Drug Design. Kluwer Academic.
92. Verkhiveker, G. M., Bouzida, D., Gehlhaar, D. K., Rejto, P. A., Freer, S. T., Rose,
P. W. (2002). Complexity and simplicity of ligand-macromolecule interactions:
The energy landscape perspective. Curr. Opin. Struct. Biol., 12, 197–203.
93. Thiel, K. A. (2004). Structure-aided drug design’s next generation. Nat. Biotech-
nol., 22(5), 513–519.
BIBLIOGRAPHY
Brünger, A. T., Adams, P. D., Clore, G. M., DeLano, W. L., Gros, P., Grosse-Kunstleve,
R. W., Jiang, J. S., Kuszewski, J., Nilges, M., Pannu, N. S., Read, R. J., Rice, L. M.,
Simonson, T., Warren., G. L. (1998). Crystallography & NMR system: A new soft-
ware suite for macromolecular structure determination. Acta Crystallogr., D54,
905–921.
Diederichs, K., Karplus, K. A. (2002). Improved R-factors for diffraction data analysis
in macromolecular crystallography. Nat. Struct. Biol., 4, 269–275.
Landau, E. M., Rummel, G., Cowan-Jacob, S. W., Rosenbusch, J. P. (1997). Crystalliza-
tion of a polar protein and small molecules from the aqueous compartment of lipidic
cubic phases. J. Phys. Chem. B, 101(11), 1935–1937.
Sout, G. H., Jensen, L. H. (1989). X-ray Structure Determination: A Practical Guide.
Wiley, New York, p. 24.
Stricher, M. L. F., Misse, D., Sironi, F., Pugniere, M., Barthe, P., Prado-Gotor, R., Freulon,
I., Magne, X., Roumestand, C., Menez, A., Lusso, P., Veas, F., Vita, C. (2003). Ratio-
nal design of a CD4 mimic that inhibits HIV-1 entry and exposes cryptic neutral-
ization epitopes. Nat. Biotechnol., 21, 71–76.
Thiessen, K. J. (1994). The use of two novel methods to grow protein crystals by micro-
dialysis and vapor diffusion in an agarose gel. Acta Crystallogr., D50, 491–495.
Zarembinski, T. I., Hung, L.-W., Mueller-Dieckmann, H.-J., Kim, K.-K., Yokota, H.,
Kim, R., Kim, S.-H. (1998). Structure-based assignment of the biochemical function
of a hypothetical protein: A test case of structural genomics. PNAS, 95(26),
15189–15193.
BIBLIOGRAPHY 455
ch10.qxd 5/6/05 03:08 PM Page 455
Page 84
hidden
1ch10.qxd 5/6/05 03:08 PM Page 456

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

1 Reader on Mendeley
by Discipline
 
by Academic Status
 
100% Researcher (at a non-Academic Institution)
by Country
 
100% United States