The structural distribution of co...
Proc. Natl. Acad. Sci. USA Vol. 95, pp. 9903���9908, August 1998 Biophysics The structural distribution of cooperative interactions in proteins: Analysis of the native state ensemble VINCENT J. HILSER*��, DAVID DOWDY���, TERRENCE G. OAS���, AND ERNESTO FREIRE*�� *Department of Biology and Biocalorimetry Center, The Johns Hopkins University, Baltimore, MD 21218 and ���Department of Biochemistry, Duke University, Durham, NC 27710 Communicated by Saul Roseman, The Johns Hopkins University, Baltimore, MD, June 16, 1998 (received for review March 11, 1998) ABSTRACT Cooperative interactions link the behavior of different amino acid residues within a protein molecule. As a result, the effects of chemical or physical perturbations to any given residue are propagated to other residues by an intricate network of interactions. Very often, amino acids ������sense������ the effects of perturbations occurring at very distant locations in the protein molecule. In these studies, we have investigated by computer simulation the structural distribution of those interactions. We show here that cooperative interactions are not intrinsically bi-directional and that different residues play different roles within the intricate network of interactions existing in a protein. The effect of a perturbation to residue j on residue k is not necessarily equal to the effect of the same perturbation to residue k on residue j. In this paper, we introduce a computer algorithm aimed at mapping the net- work of cooperative interactions within a protein. This algo- rithm exhaustively performs single site thermodynamic mu- tations to each residue in the protein and examines the effects of those mutations on the distribution of conformational states. The algorithm has been applied to three different proteins (l repressor fragment 6���85, chymotrypsin inhibitor 2, and barnase). This algorithm accounts well for the observed behavior of these proteins. Protein folding is a highly cooperative process. One of the most notable manifestations of cooperativity is that the vast major- ity of conformational states that are accessible to a protein have a negligible probability and never become populated to a significant extent. In most situations, the foldingyunfolding equilibrium is well accounted for by a two-state process in which the population of intermediates is assumed to be zero (1). Despite this fact, there is ample evidence, particularly from hydrogen exchange data obtained under native conditions, that some partially folded conformations are always present. The observed heterogeneity in the magnitude of the hydrogen exchange protection factors measured under native conditions indicates that certain residues become exposed to the solvent as a result of local rather than global unfolding (2���13). If this is the case, cooperative interactions do not involve the entire protein molecule, and the conformational equilibrium cannot be considered as an all-or-none process in which the entire protein is either folded or unfolded. If cooperative interactions do not extend uniformly throughout the entire protein mole- cule, then some residues will have a more important role than others in defining cooperativity. The purpose of this paper is to identify those residues and investigate the structural distri- bution of cooperative interactions in proteins. From a rigorous point of view, cooperativity originates when the partition function of a system cannot be written as the product of the individual partition functions of the constituent subsystems. This situation occurs when the interaction energy among different subsystems is not zero. In proteins, different structural elements interact with one another, establishing a hierarchical web that essentially extends throughout the entire protein. As a result, the Gibbs energy of each residue becomes a composite function of this intricate network of interactions. Deciphering this network from an analytical point of view is an enormous, if not hopeless, task. An alternative approach is to use large scale computer simulations. To investigate the way in which cooperative interactions propagate in a protein, one ideally would set up an experiment in which the intrinsic Gibbs energy of each residue is changed one at a time, and the effects of each change on all other residues are examined. In this ideal experiment, only the energy and not the chemical nature or atomic dimensions of each residue is changed so that no structural perturbations are introduced. In the real world, this ideal experiment cannot be realized. At best, it only can be approximated by performing Ala7Gly mutations at solvent exposed locations. With the computer, however, the energy itself can be ������mutated������ without the structure or the amino acid sequence of the protein being affected. We call this technique single site thermodynamic mutation (SSTM), and we show here that it can be used to identify and characterize a number of fundamental aspects of cooperativity in proteins. MATERIALS AND METHODS Uniformly labeled 15N l6���85 was expressed and purified as described (14). Two NMR samples were prepared, one at pH 5.00 and one at pH 6.98. All pH values reported here have not been corrected for isotope effects regardless of deuterium content. Amide hydrogen exchange rates were determined as described (14) with the following exception: Instead of 10 mM CD3COOD, the pH 5.00 exchange buffer contained 1 mM EDTA and 20 mM CD3COOD, and the pH 6.98 buffer contained 1 mM EDTA and 20 mM D3PO4. Final protein concentrations were ���1 mM. HSQC spectra were acquired on a Varian 600-MHz NMR spectrometer set at 15��C 6 0.5��C. The time between initiation of exchange and collection of the first data point was 15 min at pH 6.98 and 5 min at pH 5.00. Each 2D spectrum consisted of 4,096 data points in the 1H dimension, covering a sweep width of 8,000 Hz, and 128 points in the 15N dimension, over a sweep width of 1,350 Hz. Five initial spectra were taken at intervals of 16 minutes with 2 transients each, and all subse- quent spectra consisted of 16 or 32 transients taken at mini- mum intervals of 2 hr. Only those peaks with heights above baseline for the first five spectra were used to calculate The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked ������advertisement������ in accordance with 18 U.S.C. ��1734 solely to indicate this fact. �� 1998 by The National Academy of Sciences 0027-8424y98y959903-6$2.00y0 PNAS is available online at www.pnas.org. Abbreviations: SSTM, single site thermodynamic mutation CI2, chymotrypsin inhibitor 2 kcal, kilocalorie. ��Present address: Department of Human Biological Chemistry and Genetics, University of Texas Medical Branch at Galveston, TX 77555. ��To whom reprint requests should be addressed. e-mail: bcc@biocal2. bio.jhu.edu. 9903
exchange rates. Twenty to thirty spectra were used in each experiment. Peak heights were adjusted according to the number of transients in a given spectrum. Data were processed by using a combination of NMRPIPE (National Institutes of Health), FELIX (Biosym Technologies, Sand Diego), and KALEIDAGRAPH (Synergy Software, Reading, PA) software. Amide exchange rates were calculated by fitting peak heights versus time in KALEIDAGRAPH. Intrinsic rate constants were calculated by using the method of Bai et al. (15). All rates slower than 5 3 106 (22 residues) were determined by using the pH 6.98 data, and the remainder (33 residues) were calculated with the pH 5.00 data. Computer Simulation of the Equilibrium Ensemble of Pro- tein Conformations. Previously, the COREX algorithm, which generates a large number of partially folded states of a protein from the high resolution crystallographic or NMR structure, was introduced (12, 13, 16). In this algorithm, the ensemble of partially folded states of a protein is approximated with the computer by using the high resolution structure as a template. Within this framework, the entire protein is considered as being composed of different folding units. Partially folded states are generated by folding and unfolding these units in all possible combinations. The division of the protein into a given number of folding units is called a partition. To maximize the number of distinct partially folded states, different partitions are included in the analysis. Each partition is defined by placing a block of windows over the entire sequence of the protein. The folding units are defined by the location of the windows irrespective of whether they coincide with specific secondary structure ele- ments. By sliding the entire block of windows one residue at a time, different partitions of the protein are obtained. For two consecutive partitions, the first and last amino acids of each folding unit are shifted by one residue. This procedure is repeated until the entire set of partitions have been exhausted (see ref. 12 for details). Typically, on the order of 105 partially folded conformations are generated with the COREX algo- rithm. For the proteins l6���85, chymotrypsin inhibitor 2 (CI2), and barnase considered in this paper, windows of 5, 5, and 8 amino acid residues were used, resulting in 2.6 3 105, 0.4 3 105, and 1.1 3 105 partially folded conformations, respectively. Each of the states generated by the COREX algorithm is characterized by having some regions folded and some other regions unfolded. There are two basic assumptions in this algorithm: (i) The folded regions in partially folded states are native-like and (ii) the unfolded regions are assumed to be devoid of structure. The thermodynamic quantities (DH, DS, DCp, and DG) for each state as well as the partition function and probability of each state (Pi) are evaluated by using an empirical parameterization of the energetics (17���22). The resultant distribution of states can be used to estimate an important descriptor of the residue-specific equilibrium, the residue stability constant (kf). This quantity is the ratio of the probabilities of all states in which a residue (j) is in a folded conformation to the probabilities of all states in which that residue is in an unfolded conformation and can be expressed as: kf, j 5 SPf, j SPnf, j [1] It has been shown, through the analysis of various protein structures, that residue specific equilibria calculated according to Eq. 1 provide quantitative agreement with those obtained experimentally from amide hydrogen exchange experiments (i.e., protection factors) (12, 13, 16). The reasonable prediction of hydrogen exchange protection factors indicates that this approach effectively captures (albeit implicitly) cooperative interactions within the protein and correctly reproduces the most probable distribution of partially folded conformations. If this is the case, the derived ensemble of partially folded conformations can be used to construct a structural map of the cooperative network within the protein. Mapping Cooperativity: SSTM. Within the context of the statistical approach described above, cooperativity can be examined by changing the free energy of all states in which a particular residue is folded, in essence performing a nonper- turbing energy mutation of that residue. The resultant change in the statistical weight of all states in the numerator of Eq. 1 leads to a redistribution of the probabilities. As the subset of states in which a particular residue is folded (and unfolded) differs for each residue, the effect of a thermodynamic muta- tion will be specific for each residue in the protein. By performing individual thermodynamic mutations to each res- idue in the protein, it is possible to evaluate the effect of a change in each residue on all other residues. The end result of the SSTM analysis is a map from which the cooperative network of interactions within the protein can be deduced. We illustrate this analysis with the protein lambda 6���85 (l6���85), a fragment of the lambda repressor, which contains residues 6 to 85 and which folds into a single domain (23). This protein has been well characterized from both a structural and an ener- getic point of view (24). The pattern of hydrogen exchange protection is predicted well by the COREX algorithm as illus- trated in Fig. 1. The agreement in the pattern and amplitude between predicted and experimental protection factors indi- cates that the algorithm correctly captures the interaction energies within l6���85. The most significant discrepancy is with Arg 17, which, in the x-ray structure, forms a salt bridge with Asp 14 and is predicted to have a higher than observed protection. Overall, the average deviation between predicted and experimental values amounts to 60.9 kilocalories (kcal)y mol. The experimental protection factors show variations in magnitude that reflect the existence of partially folded con- formations that are within 3 kcalymol or less from the native state. The existence of this ������fine structure������ in the pattern of hydrogen exchange protection defines the native state as a dynamically fluctuating subensemble of conformations. Shown in Fig. 2A is the SSTM analysis of l6���85. In this representation, the energy-mutated residue lies on the ab- FIG. 1. Natural logarithm (bars) of the calculated and experimen- tal protection factors for l6���85. The calculated values were determined as described by Hilser and Freire (12). The solid line above the calculated values represents the residue stability constant as defined from Eq. 1. This quantity is defined for all residues independently of whether they exhibit protection or not. Shown also in the figure are the corresponding elements of secondary structure. The good agreement between calculated and experimental values indicates that the calcu- lated ensemble captures the general features of the actual ensemble and that the network of cooperative interactions in the protein are represented accurately in this model. 9904 Biophysics: Hilser et al. Proc. Natl. Acad. Sci. USA 95 (1998)