Binding Thermodynamics of Host-Guest Systems with SMIRNOFF99Frosst 1.0.5 from the Open Force Field Initiative.
ABSTRACT: Designing ligands that bind their target biomolecules with high affinity and specificity is a key step in small-molecule drug discovery, but accurately predicting protein-ligand binding free energies remains challenging. Key sources of errors in the calculations include inadequate sampling of conformational space, ambiguous protonation states, and errors in force fields. Noncovalent complexes between a host molecule with a binding cavity and a druglike guest molecule have emerged as powerful model systems. As model systems, host-guest complexes reduce many of the errors in more complex protein-ligand binding systems, as their small size greatly facilitates conformational sampling, and one can choose systems that avoid ambiguities in protonation states. These features, combined with their ease of experimental characterization, make host-guest systems ideal model systems to test and ultimately optimize force fields in the context of binding thermodynamics calculations. The Open Force Field Initiative aims to create a modern, open software infrastructure for automatically generating and assessing force fields using data sets. The first force field to arise out of this effort, named SMIRNOFF99Frosst, has approximately one tenth the number of parameters, in version 1.0.5, compared to typical general small molecule force fields, such as GAFF. Here, we evaluate the accuracy of this initial force field, using free energy calculations of 43 ? and ?-cyclodextrin host-guest pairs for which experimental thermodynamic data are available, and compare with matched calculations using two versions of GAFF. For all three force fields, we used TIP3P water and AM1-BCC charges. The calculations are performed using the attach-pull-release (APR) method as implemented in the open source package, pAPRika. For binding free energies, the root-mean-square error of the SMIRNOFF99Frosst calculations relative to experiment is 0.9 [0.7, 1.1] kcal/mol, while the corresponding results for GAFF 1.7 and GAFF 2.1 are 0.9 [0.7, 1.1] kcal/mol and 1.7 [1.5, 1.9] kcal/mol, respectively, with 95% confidence ranges in brackets. These results suggest that SMIRNOFF99Frosst performs competitively with existing small molecule force fields and is a parsimonious starting point for optimization.
Project description:Understanding the fine balance between changes of entropy and enthalpy and the competition between a guest and water molecules in molecular binding is crucial in fundamental studies and practical applications. Experiments provide measurements. However, illustrating the binding/unbinding processes gives a complete picture of molecular recognition not directly available from experiments, and computational methods bridge the gaps. Here, we investigated guest association/dissociation with ?-cyclodextrin (?-CD) by using microsecond-time-scale molecular dynamics (MD) simulations, postanalysis and numerical calculations. We computed association and dissociation rate constants, enthalpy, and solvent and solute entropy of binding. All the computed values of kon, koff, ?H, ?S, and ?G using GAFF-CD and q4MD-CD force fields for ?-CD could be compared with experimental data directly and agreed reasonably with experiment findings. In addition, our study further interprets experiments. Both force fields resulted in similar computed ?G from independently computed kinetics rates, ?G = -RT?ln(kon·C0/koff), and thermodynamics properties, ?G = ?H - T?S. The water entropy calculations show that the entropy gain of desolvating water molecules are a major driving force, and both force fields have the same strength of nonpolar attractions between solutes and ?-CD as well. Water molecules play a crucial role in guest binding to ?-CD. However, collective water/?-CD motions could contribute to different computed kon and ?H values by different force fields, mainly because the parameters of ?-CD provide different motions of ?-CD, hydrogen-bond networks of water molecules in the cavity of free ?-CD, and strength of desolvation penalty. As a result, q4MD-CD suggests that guest binding is mostly driven by enthalpy, while GAFF-CD shows that gaining entropy is the major driving force of binding. The study deepens our understanding of ligand-receptor recognition and suggests strategies for force field parametrization for accurately modeling molecular systems.
Project description:Computational prediction of noncovalent binding free energies with methods based on molecular mechanical force fields has become increasingly routine in drug discovery projects, where they promise to speed the discovery of small molecule ligands to bind targeted proteins with high affinity. Because the reliability of free energy methods still has significant room for improvement, new force fields, or modifications of existing ones, are regularly introduced with the aim of improving the accuracy of molecular simulations. However, comparatively little work has been done to systematically assess how well force fields perform, particularly in relation to the calculation of binding affinities. Hardware advances have made these calculations feasible, but comprehensive force field assessments for protein-ligand sized systems still remain costly. Here, we turn to cyclodextrin host-guest systems, which feature many hallmarks of protein-ligand binding interactions but are generally much more tractable due to their small size. We present absolute binding free energy and enthalpy calculations, using the attach-pull-release (APR) approach, on a set of 43 cyclodextrin-guest pairs for which experimental ITC data are available. The test set comprises both ?- and ?-cyclodextrin hosts binding a series of small organic guests, each with one of three functional groups: ammonium, alcohol, or carboxylate. Four water models are considered (TIP3P, TIP4Pew, SPC/E, and OPC), along with two partial charge assignment procedures (RESP and AM1-BCC) and two cyclodextrin host force fields. The results suggest a complex set of considerations when choosing a force field for biomolecular simulations. For example, some force field combinations clearly outperform others at the binding enthalpy calculations but not for the binding free energy. Additionally, a force field combination which we expected to be the worst performer gave the most accurate binding free energies - but the least accurate binding enthalpies. The results have implications for the development of improved force fields, and we propose this test set, and potential future elaborations of it, as a powerful validation suite to evaluate new force fields and help guide future force field development.
Project description:Meaningful efforts in computer-aided drug design (CADD) require accurate molecular mechanical force fields to quantitatively characterize protein-ligand interactions, ligand hydration free energies, and other ligand physical properties. Atomic models of new compounds are commonly generated by analogy from the predefined tabulated parameters of a given force field. Two widely used approaches following this strategy are the General Amber Force Field (GAFF) and the CHARMM General Force Field (CGenFF). An important limitation of using pretabulated parameter values is that they may be inadequate in the context of a specific molecule. To resolve this issue, we previously introduced the General Automated Atomic Model Parameterization (GAAMP) for automatically generating the parameters of atomic models of small molecules, using the results from ab initio quantum mechanical (QM) calculations as target data. The GAAMP protocol uses QM data to optimize the bond, valence angle, and dihedral angle internal parameters, and atomic partial charges. However, since the treatment of van der Waals interactions based on QM is challenging and may often be unreliable, the Lennard-Jones 6-12 parameters are kept unchanged from the initial atom types assignments (GAFF or CGenFF), which limits the accuracy that can be achieved by these models. To address this issue, a new set of Lennard-Jones 6-12 parameters was systematically optimized to reproduce experimental neat liquid densities and enthalpies of vaporization for a large set of 430 compounds, covering a wide range of chemical functionalities. Calculations of the hydration free energy indicate that optimal accuracy for these models is achieved when the molecule-water van der Waals dispersion is rescaled by a factor of 1.115. The final optimized model yields an average unsigned error of 0.79 kcal/mol in the hydration free energies.
Project description:The absolute binding free energies and binding enthalpies of twelve host-guest systems in the SAMPL5 blind challenge were computed using our attach-pull-release (APR) approach. This method has previously shown good correlations between experimental and calculated binding data in retrospective studies of cucurbituril (CB7) and ?-cyclodextrin (?CD) systems. In the present work, the computed binding free energies for host octa acid (OA or OAH) and tetra-endo-methyl octa-acid (TEMOA or OAMe) with guests are in good agreement with prospective experimental data, with a coefficient of determination (R2) of 0.8 and root-mean-squared error of 1.7 kcal/mol using the TIP3P water model. The binding enthalpy calculations achieve moderate correlations, with R2 of 0.5 and RMSE of 2.5 kcal/mol, for TIP3P water. Calculations using the newly developed OPC water model also show good performance. Furthermore, the present calculations semi-quantitatively capture the experimental trend of enthalpy-entropy compensation observed, and successfully predict guests with the strongest and weakest binding affinity. The most populated binding poses of all twelve systems, based on clustering analysis of 750 ns molecular dynamics (MD) trajectories, were extracted and analyzed. Computational methods using MD simulations and explicit solvent models in a rigorous statistical thermodynamic framework, like APR, can generate reasonable predictions of binding thermodynamics. Especially with continuing improvement in simulation force fields, such methods hold the promise of making substantial contributions to hit identification and lead optimization in the drug discovery process.
Project description:Computer simulations of biomolecular systems often use force fields, which are combinations of simple empirical atom-based functions to describe the molecular interactions. Even though polarizable force fields give a more detailed description of intermolecular interactions, nonpolarizable force fields, developed several decades ago, are often still preferred because of their reduced computation cost. Electrostatic interactions play a major role in biomolecular systems and are therein described by atomic point charges. In this work, we address the performance of different atomic charges to reproduce experimental hydration free energies in the FreeSolv database in combination with the GAFF force field. Atomic charges were calculated by two atoms-in-molecules approaches, Hirshfeld-I and Minimal Basis Iterative Stockholder (MBIS). To account for polarization effects, the charges were derived from the solute's electron density computed with an implicit solvent model, and the energy required to polarize the solute was added to the free energy cycle. The calculated hydration free energies were analyzed with an error model, revealing systematic errors associated with specific functional groups or chemical elements. The best agreement with the experimental data is observed for the AM1-BCC and the MBIS atomic charge methods. The latter includes the solvent polarization and presents a root-mean-square error of 2.0 kcal mol-1 for the 613 organic molecules studied. The largest deviation was observed for phosphorus-containing molecules and the molecules with amide, ester and amine functional groups.
Project description:Accurately predicting the binding affinities of small organic molecules to biological macromolecules can greatly accelerate drug discovery by reducing the number of compounds that must be synthesized to realize desired potency and selectivity goals. Unfortunately, the process of assessing the accuracy of current computational approaches to affinity prediction against binding data to biological macromolecules is frustrated by several challenges, such as slow conformational dynamics, multiple titratable groups, and the lack of high-quality blinded datasets. Over the last several SAMPL blind challenge exercises, host-guest systems have emerged as a practical and effective way to circumvent these challenges in assessing the predictive performance of current-generation quantitative modeling tools, while still providing systems capable of possessing tight binding affinities. Here, we present an overview of the SAMPL6 host-guest binding affinity prediction challenge, which featured three supramolecular hosts: octa-acid (OA), the closely related tetra-endo-methyl-octa-acid (TEMOA), and cucurbituril (CB8), along with 21 small organic guest molecules. A total of 119 entries were received from ten participating groups employing a variety of methods that spanned from electronic structure and movable type calculations in implicit solvent to alchemical and potential of mean force strategies using empirical force fields with explicit solvent models. While empirical models tended to obtain better performance than first-principle methods, it was not possible to identify a single approach that consistently provided superior results across all host-guest systems and statistical metrics. Moreover, the accuracy of the methodologies generally displayed a substantial dependence on the system considered, emphasizing the need for host diversity in blind evaluations. Several entries exploited previous experimental measurements of similar host-guest systems in an effort to improve their physical-based predictions via some manner of rudimentary machine learning; while this strategy succeeded in reducing systematic errors, it did not correspond to an improvement in statistical correlation. Comparison to previous rounds of the host-guest binding free energy challenge highlights an overall improvement in the correlation obtained by the affinity predictions for OA and TEMOA systems, but a surprising lack of improvement regarding root mean square error over the past several challenge rounds. The data suggests that further refinement of force field parameters, as well as improved treatment of chemical effects (e.g., buffer salt conditions, protonation states), may be required to further enhance predictive accuracy.
Project description:The ability to computationally predict protein-small molecule binding affinities with high accuracy would accelerate drug discovery and reduce its cost by eliminating rounds of trial-and-error synthesis and experimental evaluation of candidate ligands. As academic and industrial groups work toward this capability, there is an ongoing need for datasets that can be used to rigorously test new computational methods. Although protein-ligand data are clearly important for this purpose, their size and complexity make it difficult to obtain well-converged results and to troubleshoot computational methods. Host-guest systems offer a valuable alternative class of test cases, as they exemplify noncovalent molecular recognition but are far smaller and simpler. As a consequence, host-guest systems have been part of the prior two rounds of SAMPL prediction exercises, and they also figure in the present SAMPL5 round. In addition to being blinded, and thus avoiding biases that may arise in retrospective studies, the SAMPL challenges have the merit of focusing multiple researchers on a common set of molecular systems, so that methods may be compared and ideas exchanged. The present paper provides an overview of the host-guest component of SAMPL5, which centers on three different hosts, two octa-acids and a glycoluril-based molecular clip, and two different sets of guest molecules, in aqueous solution. A range of methods were applied, including electronic structure calculations with implicit solvent models; methods that combine empirical force fields with implicit solvent models; and explicit solvent free energy simulations. The most reliable methods tend to fall in the latter class, consistent with results in prior SAMPL rounds, but the level of accuracy is still below that sought for reliable computer-aided drug design. Advances in force field accuracy, modeling of protonation equilibria, electronic structure methods, and solvent models, hold promise for future improvements.
Project description:In protein-ligand binding, the electrostatic environments of the two binding partners may vary significantly in bound and unbound states, which may lead to protonation changes upon binding. In cases where ligand binding results in a net uptake or release of protons, the free energy of binding is pH-dependent. Nevertheless, conventional free energy calculations and molecular docking protocols typically do not rigorously account for changes in protonation that may occur upon ligand binding. To address these shortcomings, we present a simple methodology based on Wyman's binding polynomial formalism to account for the pH dependence of binding free energies and demonstrate its use on cucurbituril (CB) host-guest systems. Using constant pH molecular dynamics and a reference binding free energy that is taken either from experiment or from thermodynamic integration computations, the pH-dependent binding free energy is determined. This computational protocol accurately captures the large pKa shifts observed experimentally upon CB:guest association and reproduces experimental binding free energies at different levels of pH. We show that incorrect assignment of fixed protonation states in free energy computations can give errors of >2 kcal/mol in these host-guest systems. Use of the methods presented here avoids such errors, thus suggesting their utility in computing proton-linked binding free energies for protein-ligand complexes.
Project description:Accurately calculating the entropy of liquids is an important goal, given that many processes take place in the liquid phase. Of almost equal importance is understanding the values obtained. However, there are few methods that can calculate the entropy of such systems, and fewer still to make sense of the values obtained. We present our multiscale cell correlation (MCC) method to calculate the entropy of liquids from molecular dynamics simulations. The method uses forces and torques at the molecule and united-atom levels and probability distributions of molecular coordinations and conformations. The main differences with previous work are the consistent treatment of the mean-field cell approximation to the approriate degrees of freedom, the separation of the force and torque covariance matrices, and the inclusion of conformation correlation for molecules with multiple dihedrals. MCC is applied to a broader set of 56 important industrial liquids modeled using the Generalized AMBER Force Field (GAFF) and Optimized Potentials for Liquid Simulations (OPLS) force fields with 1.14*CM1A charges. Unsigned errors versus experimental entropies are 8.7 J K - 1 mol - 1 for GAFF and 9.8 J K - 1 mol - 1 for OPLS. This is significantly better than the 2-Phase Thermodynamics method for the subset of molecules in common, which is the only other method that has been applied to such systems. MCC makes clear why the entropy has the value it does by providing a decomposition in terms of translational and rotational vibrational entropy and topographical entropy at the molecular and united-atom levels.
Project description:We present a strategy for carrying out high-precision calculations of binding free energy and binding enthalpy values from molecular dynamics simulations with explicit solvent. The approach is used to calculate the thermodynamic profiles for binding of nine small molecule guests to either the cucurbituril (CB7) or ?-cyclodextrin (?CD) host. For these systems, calculations using commodity hardware can yield binding free energy and binding enthalpy values with a precision of ?0.5 kcal/mol (95% CI) in a matter of days. Crucially, the self-consistency of the approach is established by calculating the binding enthalpy directly, via end point potential energy calculations, and indirectly, via the temperature dependence of the binding free energy, i.e., by the van't Hoff equation. Excellent agreement between the direct and van't Hoff methods is demonstrated for both host-guest systems and an ion-pair model system for which particularly well-converged results are attainable. Additionally, we find that hydrogen mass repartitioning allows marked acceleration of the calculations with no discernible cost in precision or accuracy. Finally, we provide guidance for accurately assessing numerical uncertainty of the results in settings where complex correlations in the time series can pose challenges to statistical analysis. The routine nature and high precision of these binding calculations opens the possibility of including measured binding thermodynamics as target data in force field optimization so that simulations may be used to reliably interpret experimental data and guide molecular design.