Estimating the configurational en...
ABSTRACT During the past decades, the calculation of accurate free-energy differences from molecular simulations has become feasible in practice. In contrast, the reliable estimation of absolute entropies and entropy differences from these simulations is a notoriously difficult problem. This article investigates critically the method to estimate configurational entropies from molecular dynamics simulations based on the quasi-harmonic approximation. The theory, assumptions, and approximations underlying this method are presented, as well as its connection with essential- mode and normal-mode analyses. In particular, the following points are considered: (i) the relationship between quasi-harmonic and essential modes (ii) the requirement of mass-weighting (or metric- tensor-weighting) in quasi-harmonic analysis (iii) the effect of anharmonicities in the individual modes on the estimated entropy (iv) the effect of pairwise (supralinear) correlations among the different modes on the estimated entropy. The analyses are carried out in the context of long (hundreds of nanoseconds) molecular dynamics simulations involving the reversible folding of ��- peptides, considering individually the specific properties of the folded and unfolded ensembles. The anharmonicity correction to the quasi- harmonic entropy is small. In contrast, the pairwise (supralinear) correlation correction is Estimating the configurational entropy from molecular dynamics simulations: anharmonicity and correlation corrections to the quasi-harmonic approximation large and affects to a larger extent the entropy of the folded state than that of the unfolded state. The proposed procedure to evaluate corrections for anharmonicity and correlation effects allows for an improved calculation of absolute entropies, as well as of entropy differences for molecular systems which undergo conformational transitions. KEYWORDS: computer simulation entropy quasi-harmonic analysis anharmonicity correlation peptide folding 1. INTRODUCTION Entropy is a key property to the understanding of a wide variety of physical, chemical and biochemical phenomena [1-8]. In particular, entropic effects play a fundamental role in protein folding [9-15] and stability [16-19], molecular association [20-22], ligand binding [18,23-29], and enzyme catalysis . Often, chemical and biochemical processes are characterized by a delicate balance between a large change in enthalpy and a large opposite change in entropy (enthalpy-entropy compensation), resulting in comparatively small free energy differences [4-10,12,16,20,24,29,31,32]. During the past decades, the calculation of accurate free-energy differences from molecular simulations has become feasible in practice [33- 43]. In contrast, the reliable estimation of absolute entropies and entropy differences from these Laboratorium f��r Physikalische Chemie, ETH, ETH-H��nggerberg, CH-8093 Z��rich, Switzerland Riccardo Baron, Wilfred F. van Gunsteren and Philippe H. H��nenberger* *Corresponding author firstname.lastname@example.org T r e n d s i n P h y s i c a l C h e m i s t r y Vol. 11, 2006
88 Riccardo Baron et al. In practice, however, the quasi-harmonic entropy is evaluated by application of the quantum- mechanical equation for the entropy of a harmonic oscillator, rather than the classical one. This point is essential in the presence of stiff degrees of freedom (e.g. bond-stretching or bond-angle bending vibrations, geometric constraints), because the quantum-mechanical entropy converges to zero in the limit of high oscillation frequencies, while the corresponding classical entropy incorrectly diverges to minus infinity. The quasi-harmonic approach was applied to biomolecular systems [16,44,45,61,62], and further extended by critical analyses of the approximations involved and of the resulting errors [45-47,53,61]. The application of the method as initially suggested by Karplus and Kushick  has a number of drawbacks: (i) the use of internal coordinates complicates the computational implementation of the method (ii) an approximation must be made for the configurational dependence of the Jacobian associated with the internal-to-Cartesian coordinate transformation (metric-tensor effects) (iii) the choice of a non-redundant set of internal coordinates is not unique and affects the resulting entropy estimate (iv) some internal coordinates (e.g. torsional angles) are periodic (i.e. their representation by a single-well harmonic oscillator is questionable and the associated averages and covariances cannot be unambiguously defined). About ten years after the introduction of the quasi- harmonic approach, Schlitter suggested that the above problems could be alleviated by performing the analysis (still restricted to bound systems) in terms of Cartesian coordinates instead of internal ones . He also realized that the method was capable of estimating absolute (rather than relative) entropies and suggested that, in the limit of infinite sampling, the estimated absolute entropy (i.e. that of a multi-dimensional system harmonic in a specified set of generalized coordinates and with metric-tensor-weighted coordinate averages and covariances identical to those monitored during the simulation) always provides an upper-bound to the true entropy of the simulated system. As shown in Appendix A, this latter statement is only rigorously exact at the classical level for a generalized coordinate system involving a configuration-independent Jacobian for the generalized-to-Cartesian coordinate simulations is a notoriously difficult problem [41,44-53] and one of the key current challenges in computational chemistry. In principle, the calculation of exact (within the force field and simulation methodology employed) absolute entropies from molecular simulations is an impossible task, because this quantity measures the overall extent of accessible phase space (i.e. would require infinitely long simulations to converge) [41,52]. There are two ways of circumventing this problem. The first approach is to restrict the task to the calculation of the entropy difference between two states of the system. In this case, appropriate generalizations of the coupling-parameter approach commonly used to evaluate free-energy differences can be used to estimate entropy differences via e.g. free-energy perturbation [52,54-56], thermodynamic integration [38,52,55], particle insertion [57,58], umbrella sampling , or finite-temperature differences . In this case, appropriate sampling is only required for the regions of phase space where the Hamiltonians corresponding to the two states differ significantly. Unfortunately, this sampling still generally requires (series of) prohibitively long simulations and the method only provides converged results in favorable cases [41,52]. The second approach is to estimate the absolute entropy based on an analytical approximation to the configurational probability distribution of the system, expressed in a specified set of generalized coordinates. This analytical function involves parameters (e.g. moments of the distribution) that can in principle be estimated from a single (sufficiently long) simulation of the system. This second approach was introduced by Karplus and Kushick , who applied it to bound (non-diffusive) systems (e.g. a single covalently linked macromolecule) on the basis of internal coordinates (subset of the distances, angles and torsional angles defining a configuration of the molecule, excluding overall translation and rotation) and with a multivariate Gaussian approximation for the configurational probability distribution (the parameters of which are the averages and covariances of the selected internal coordinates during the simulation). Because the normalized Gaussian is the probability distribution associated with a classical one-dimensional harmonic oscillator, this method was called quasi-harmonic analysis.
Configurational entropy from molecular dynamics simulations 89 individual coordinates (ii) correlations among the probability distributions associated with different coordinates (beyond the pairwise linear correlations accounted for in the harmonic model). These effects are neglected in the quasi-harmonic analysis and (nearly see Appendix A) always correspond to a negative entropy contribution. From a different perspective, the (positive) error associated with a quasi-harmonic entropy estimate may be viewed as arising from the approximation of the typically frustrated (many local minima and barriers) potential energy landscape characteristic of most molecular systems by a single (effective) harmonic basin . In favourable cases, better entropy estimates can be obtained through methods that explicitly take into account the multiple-minima structure of this landscape (e.g. mining-minima approach [53,64]). However, in the context of biomolecular systems with explicit solvation, the enumeration of all minima (or even of only the most relevant ones) is not possible. In these cases, the quasi-harmonic approach (possibly including corrections such as the ones described in the present article) is probably the only option currently available. The application of quasi-harmonic analysis in terms of Cartesian coordinates requires the removal of the overall (center of mass) translational motion from the sampled configurations (because the analysis assumes a bound system). If required, the translational entropy contribution can be later reintroduced using the appropriate quantum- mechanical expression (Sackur-Tetrode equation based on a specified standard state for the system volume or pressure). Although the removal of the overall rotational motion is in principle not required, it is recommended in practice , because: (i) overall rotation is a highly correlated motion (associated with a relatively small entropy contribution), while the quasi-harmonic analysis interprets it as a superposition of uncorrelated harmonic motions along the individual Cartesian coordinates (associated with a much larger entropy contribution) (ii) the tumbling time of macro- molecules is generally long , so that rotational motions cannot be sampled accurately on the timescales reachable nowadays in explicit-solvent simulations of these systems. However, the overall rotation of a flexible molecule cannot be separated transformation (e.g. for a Cartesian coordinate system). For such coordinate systems, the statement remains valid at the quantum- mechanical level, although the proof given by Schlitter  is incomplete . In other cases, this principle may be violated. However, the maximum entropy distribution will generally remain very close to that associated with a corresponding harmonic system and violations are only likely to be encountered for systems that are inherently very close to harmonicity, which is not the case of complex (macro)molecular systems. In the present article, the upper-bound property of the quasi-harmonic entropy estimate will be considered as being of general validity. Since the actual quasi-harmonic entropy estimate depends on the choice of a coordinate system (i.e. on the nature of the underlying harmonic model), the ���optimal��� coordinate system will be the one leading to the lowest estimate . Unfortunately, there is currently no systematic procedure for determining this optimal coordinate system. In view of the previously mentioned drawbacks of internal coordinates (and of the difficulty in applying metric-tensor-weighting to these coordinates, see below), the use of Cartesian coordinates probably represents a good compromise. Yet, this choice also has three main drawbacks: (i) ambiguity may arise when removing the overall translational and rotational motions from the configurations sampled during the simulation of a (macro) molecule (ii) the average molecular structure (equilibrium configuration in the harmonic model) may be less realistic when defined in terms of Cartesian coordinates (after removal of overall translational and rotational motions) compared to internal ones (iii) the different potential energy terms of a force field are generally associated with specific internal coordinates, so that these coordinates appear to be a more ���natural��� choice for the quasi-harmonic analysis. Indeed, calculations on model systems have shown that internal coordinates tend to result in lower (i.e. better) entropy estimates . Irrespective of the chosen coordinate system, the difference between the true entropy of the simulated system and its quasi-harmonic estimate arises from: (i) anharmonicities (i.e. non-Gaussian behavior) in the probability distributions along
90 Riccardo Baron et al. hydrocarbons was undertaken to compare the configurational spaces accessible using simulation models of different spatial resolutions . Applications to larger biomolecules involved the estimation of side-chain entropies in a model for a protein molten-globule state , as well as of the entropy changes upon binding a fatty acid to a fatty-acid binding protein , upon complexation of a protein with its receptor , upon binding of the CAP and ��-repressor to cognate DNA sequences , and upon binding of netropsin and distamycin ligands to the minor groove of a model DNA duplex . The practical implementation of quasi-harmonic analysis using a specific set of generalized coordinates requires the diagonalization of the corresponding metric-tensor-weighted covariance matrix as evaluated from the simulation (in a Cartesian coordinate system, metric-tensor- weighting is equivalent to mass-weighting). The resulting eigenvectors represent motional modes (quasi-harmonic modes) around the average system configuration, associated with probability distributions that have zero averages and are pairwise linearly uncorrelated. Furthermore, if the configurational probability distribution of the system is approximated by a multivariate Gaussian with the same (metric-tensor-weighted) covariance matrix as the real distribution, the probability distributions associated with the quasi-harmonic modes are Gaussian (of widths determined by the corresponding eigenvalues) and fully uncorrelated. In this case, the approximate system entropy (quasi-harmonic estimate) is a sum of harmonic entropy contributions associated with each of the quasi-harmonic modes. These contributions can be calculated analytically based on the corresponding eigenvalues, using the appropriate classical or (preferably) quantum-mechanical expression for the entropy of a one-dimensional harmonic oscillator. As suggested by Schlitter , the diagonalization process can be substituted by a determinant calculation if the correct quantum- mechanical formula is replaced by an approximate heuristic expression (that slightly overestimates the quasi-harmonic entropy). However, since the computational gain is only moderate and the quasi-harmonic modes can no longer be determined, the use of this approximate formula is probably not recommended in practice . from internal motions in a unique fashion [66-68], and this separation is always somewhat arbitrary [28,69,70]. In practice, the removal of overall translation and rotation from the sampled configurations is commonly performed by atom- positional least-squares fitting of successive structures along a trajectory onto a common reference structure [49,71,72]. Alternatively, simulations can be performed with constrained roto-translational motion . If necessary, the resulting quasi-harmonic entropy estimate can be (approximately) corrected using the quantum- mechanical expression for the entropy of a rigid rotor (based on the average inertia tensor of the molecule). The procedure outlined by Schlitter to evaluate quasi-harmonic entropies based on Cartesian coordinates  was implemented  and tested on simulations of reversible peptide folding in methanol at different temperatures . The results showed that the solute quasi-harmonic entropy does not give sufficient information on the thermodynamics of the folding process, because: (i) the quasi-harmonic entropy estimate is an upper bound to the true entropy of the simulated peptide, both in the folded and unfolded states, but the magnitude of the associated error is unknown and may differ between the two states (ii) solute- solvent and solvent-solvent correlations provide an unknown (but probably large) contribution to the entropy, which may also significantly differ between the folded and unfolded states. The same procedure was applied to investigate the influence of different solvents, folds and peptide chain lengths on the configurational entropy of carbopeptoids . The results evidenced significant dependence of the estimated solute entropy on the properties of the solvent (in contrast to an earlier suggestion ) and on the characteristics of the folded configuration. In another study, the configurational entropy was calculated for trialanine in water and compared with vibrational spectroscopy experiments to interpret the stability of different peptide conformations . The application of the Schlitter approach to fairly rigid organic compounds was also undertaken in order to estimate the translational and rotational contributions to the absolute entropy . Its application to liquid
Configurational entropy from molecular dynamics simulations 91 coordinates, but without mass-weighting of the covariance matrix prior to diagonalization. However, to our knowledge, the relationship between quasi- harmonic and essential modes (and between the probability distributions along these modes) has never been investigated so far. Finally, both quasi-harmonic and essential-mode analyses should be distinguished from normal- mode analysis [87-97]. This analysis, typically performed in terms of Cartesian coordinates (see, however, Section 2.1), relies on evaluating and diagonalizing the mass-weighted Hessian matrix (second derivative of the potential energy with respect to the atomic coordinates) associated with a single structure (usually corresponding to a stationary point on the potential-energy surface). Normal-mode analysis probes the vibrational modes associated with this single structure, i.e. a very local region of the configurational space, while quasi-harmonic and essential-mode analyses aim at characterizing the global extent of the configurational space accessible to the system at a given temperature (figure 1). Therefore, an entropy estimate based on normal-mode analysis As mentioned above (and more explicitly formulated in Section 2) the diagonalization process in a quasi-harmonic analysis must be applied to the metric-tensor-weighted covariance matrix. However, previous applications of this approach in terms of internal coordinates [16,44- 47,61,62] incorrectly omitted this weighting, leading to an error in the estimated entropy (the nature and magnitude of which is difficult to characterize) and to an inconsistency in the dimensionality of the estimated entropy (dependence on the arbitrary choice for the unit of mass). The above procedure for quasi-harmonic analysis bears strong analogies with the essential-mode analysis developed by the Berendsen group [83- 86]. The aim of this approach is to decompose the atom-positional fluctuations of a (bound) molecular system as obtained from a simulation into pairwise linearly uncorrelated motional modes (essential modes), associated with additive contributions to the total mean-square atom-positional fluctuations. In fact, essential-mode analysis is equivalent to a quasi-harmonic analysis applied in Cartesian Figure 1. Schematic comparison of the principles underlying normal-mode (left) and quasi-harmonic (right) analyses. Normal-mode analysis probes the local configurational space ( q ) around a single (usually stationary) point ( qo ), based on the local curvature of the potential energy surface ( (q ) V ) at this point. Quasi-harmonic analysis probes the global extent of configurational space ( q ) accessible to a molecular system at a given temperature ( B k T ).