Exploring the parameter space of the coarse-grained UNRES force field by random search: selecting a transferable medium-resolution force field.
ABSTRACT: We explored the energy-parameter space of our coarse-grained UNRES force field for large-scale ab initio simulations of protein folding, to obtain good initial approximations for hierarchical optimization of the force field with new virtual-bond-angle bending and side-chain-rotamer potentials which we recently introduced to replace the statistical potentials. 100 sets of energy-term weights were generated randomly, and good sets were selected by carrying out replica-exchange molecular dynamics simulations of two peptides with a minimal alpha-helical and a minimal beta-hairpin fold, respectively: the tryptophan cage (PDB code: 1L2Y) and tryptophan zipper (PDB code: 1LE1). Eight sets of parameters produced native-like structures of these two peptides. These eight sets were tested on two larger proteins: the engrailed homeodomain (PDB code: 1ENH) and FBP WW domain (PDB code: 1E0L); two sets were found to produce native-like conformations of these proteins. These two sets were tested further on a larger set of nine proteins with alpha or alpha + beta structure and found to locate native-like structures of most of them. These results demonstrate that, in addition to finding reasonable initial starting points for optimization, an extensive search of parameter space is a powerful method to produce a transferable force field.
Project description:Coarse-grained molecular dynamics simulations offer a dramatic extension of the time-scale of simulations compared to all-atom approaches. In this article, we describe the use of the physics-based united-residue (UNRES) force field, developed in our laboratory, in protein-structure simulations. We demonstrate that this force field offers about a 4000-times extension of the simulation time scale; this feature arises both from averaging out the fast-moving degrees of freedom and reduction of the cost of energy and force calculations compared to all-atom approaches with explicit solvent. With massively parallel computers, microsecond folding simulation times of proteins containing about 1000 residues can be obtained in days. A straightforward application of canonical UNRES/MD simulations, demonstrated with the example of the N-terminal part of the B-domain of staphylococcal protein A (PDB code: 1BDD, a three-alpha-helix bundle), discerns the folding mechanism and determines kinetic parameters by parallel simulations of several hundred or more trajectories. Use of generalized-ensemble techniques, of which the multiplexed replica exchange method proved to be the most effective, enables us to compute thermodynamics of folding and carry out fully physics-based prediction of protein structure, in which the predicted structure is determined as a mean over the most populated ensemble below the folding-transition temperature. By using principal component analysis of the UNRES folding trajectories of the formin-binding protein WW domain (PDB code: 1E0L; a three-stranded antiparallel beta-sheet) and 1BDD, we identified representative structures along the folding pathways and demonstrated that only a few (low-indexed) principal components can capture the main structural features of a protein-folding trajectory; the potentials of mean force calculated along these essential modes exhibit multiple minima, as opposed to those along the remaining modes that are unimodal. In addition, a comparison between the structures that are representative of the minima in the free-energy profile along the essential collective coordinates of protein folding (computed by principal component analysis) and the free-energy profile projected along the virtual-bond dihedral angles gamma of the backbone revealed the key residues involved in the transitions between the different basins of the folding free-energy profile, in agreement with existing experimental data for 1E0L .
Project description:The conformational space annealing (CSA) method for global optimization has been applied to the 10-55 fragment of the B-domain of staphylococcal protein A (protein A) and to a 75-residue protein, apo calbindin D9K (PDB ID code), by using the UNRES off-lattice united-residue force field. Although the potential was not calibrated with these two proteins, the native-like structures were found among the low-energy conformations, without the use of threading or secondary-structure predictions. This is because the CSA method can find many distinct families of low-energy conformations. Starting from random conformations, the CSA method found that there are two families of low-energy conformations for each of the two proteins, the native-like fold and its mirror image. The CSA method converged to the same low-energy folds in all cases studied, as opposed to other optimization methods. It appears that the CSA method with the UNRES force field, which is based on the thermodynamic hypothesis, can be used in prediction of protein structures in real time.
Project description:A novel method of parameter optimization is proposed. It makes use of large sets of decoys generated for six nonhomologous proteins with different architecture. Parameter optimization is achieved by creating a free energy gap between sets of nativelike and nonnative conformations. The method is applied to optimize the parameters of a physics-based scoring function consisting of the all-atom ECEPP05 force field coupled with an implicit solvent model (a solvent-accessible surface area model). The optimized force field is able to discriminate near-native from nonnative conformations of the six training proteins when used either for local energy minimization or for short Monte Carlo simulated annealing runs after local energy minimization. The resulting force field is validated with an independent set of six nonhomologous proteins, and appears to be transferable to proteins not included in the optimization; i.e., for five out of the six test proteins, decoys with 1.7- to 4.0-A all-heavy-atom root mean-square deviations emerge as those with the lowest energy. In addition, we examined the set of misfolded structures created by Park and Levitt using a four-state reduced model. The results from these additional calculations confirm the good discriminative ability of the optimized force field obtained with our decoy sets.
Project description:We present an extension of the all-atom internal-coordinate force field, ICMFF, that allows for simulation of heterogeneous systems including hexopyranose saccharides and glycan chains in addition to proteins. A library of standard glycan geometries containing α- and β-anomers of the most common hexapyranoses, i.e., d-galactose, d-glucose, d-mannose, d-xylose, l-fucose, N-acetylglucosamine, N-acetylgalactosamine, sialic, and glucuronic acids, is created based on the analysis of the saccharide structures reported in the Cambridge Structural Database. The new force field parameters include molecular electrostatic potential-derived partial atomic charges and the torsional parameters derived from quantum mechanical data for a collection of minimal molecular fragments and related molecules. The ϕ/ψ torsional parameters for different types of glycosidic linkages are developed using model compounds containing the key atoms in the full carbohydrates, i.e., glycosidic-linked tetrahydropyran-cyclohexane dimers. Target data for parameter optimization include two-dimensional energy surfaces corresponding to the ϕ/ψ glycosidic dihedral angles in the disaccharide analogues, as determined by quantum mechanical MP2/6-31G** single-point energies on HF/6-31G** optimized structures. To achieve better agreement with the observed geometries of glycosidic linkages, the bond angles at the O-linkage atoms are added to the internal variable set and the corresponding bond bending energy term is parametrized using quantum mechanical data. The resulting force field is validated on glycan chains of 1-12 residues from a set of high-resolution X-ray glycoprotein structures based on heavy atom root-mean-square deviations of the lowest-energy glycan conformations generated by the biased probability Monte Carlo (BPMC) molecular mechanics simulations from the native structures. The appropriate BPMC distributions for monosaccharide-monosaccharide and protein-glycan linkages are derived from the extensive analysis of conformational properties of glycoprotein structures reported in the Protein Data Bank. Use of the BPMC search leads to significant improvements in sampling efficiency for glycan simulations. Moreover, good agreement with the X-ray glycoprotein structures is achieved for all glycan chain lengths. Thus, average/median RMSDs are 0.81/0.68 Å for one-residue glycans and 1.32/1.47 Å for three-residue glycans. RMSD from the native structure for the lowest-energy conformation of the 12-residue glycan chain (PDB ID 3og2) is 1.53 Å. Additionally, results obtained for free short oligosaccharides using the new force field are in line with the available experimental data, i.e., the most populated conformations in solution are predicted to be the lowest energy ones. The newly developed parameters allow for the accurate modeling of linear and branched hexopyranose glycosides in heterogeneous systems.
Project description:The Quantum Chemical Topological Force Field (QCTFF) uses the machine learning method kriging to map atomic multipole moments to the coordinates of all atoms in the molecular system. It is important that kriging operates on relevant and realistic training sets of molecular geometries. Therefore, we sampled single amino acid geometries directly from protein crystal structures stored in the Protein Databank (PDB). This sampling enhances the conformational realism (in terms of dihedral angles) of the training geometries. However, these geometries can be fraught with inaccurate bond lengths and valence angles due to artefacts of the refinement process of the X-ray diffraction patterns, combined with experimentally invisible hydrogen atoms. This is why we developed a hybrid PDB/nonstationary normal modes (NM) sampling approach called PDB/NM. This method is superior over standard NM sampling, which captures only geometries optimized from the stationary points of single amino acids in the gas phase. Indeed, PDB/NM combines the sampling of relevant dihedral angles with chemically correct local geometries. Geometries sampled using PDB/NM were used to build kriging models for alanine and lysine, and their prediction accuracy was compared to models built from geometries sampled from three other sampling approaches. Bond length variation, as opposed to variation in dihedral angles, puts pressure on prediction accuracy, potentially lowering it. Hence, the larger coverage of dihedral angles of the PDB/NM method does not deteriorate the predictive accuracy of kriging models, compared to the NM sampling around local energetic minima used so far in the development of QCTFF.
Project description:We present here the parmbsc0 force field, a refinement of the AMBER parm99 force field, where emphasis has been made on the correct representation of the alpha/gamma concerted rotation in nucleic acids (NAs). The modified force field corrects overpopulations of the alpha/gamma = (g+,t) backbone that were seen in long (more than 10 ns) simulations with previous AMBER parameter sets (parm94-99). The force field has been derived by fitting to high-level quantum mechanical data and verified by comparison with very high-level quantum mechanical calculations and by a very extensive comparison between simulations and experimental data. The set of validation simulations includes two of the longest trajectories published to date for the DNA duplex (200 ns each) and the largest variety of NA structures studied to date (15 different NA families and 97 individual structures). The total simulation time used to validate the force field includes near 1 mus of state-of-the-art molecular dynamics simulations in aqueous solution.
Project description:The ff94 force field that is commonly associated with the Amber simulation package is one of the most widely used parameter sets for biomolecular simulation. After a decade of extensive use and testing, limitations in this force field, such as over-stabilization of alpha-helices, were reported by us and other researchers. This led to a number of attempts to improve these parameters, resulting in a variety of "Amber" force fields and significant difficulty in determining which should be used for a particular application. We show that several of these continue to suffer from inadequate balance between different secondary structure elements. In addition, the approach used in most of these studies neglected to account for the existence in Amber of two sets of backbone phi/psi dihedral terms. This led to parameter sets that provide unreasonable conformational preferences for glycine. We report here an effort to improve the phi/psi dihedral terms in the ff99 energy function. Dihedral term parameters are based on fitting the energies of multiple conformations of glycine and alanine tetrapeptides from high level ab initio quantum mechanical calculations. The new parameters for backbone dihedrals replace those in the existing ff99 force field. This parameter set, which we denote ff99SB, achieves a better balance of secondary structure elements as judged by improved distribution of backbone dihedrals for glycine and alanine with respect to PDB survey data. It also accomplishes improved agreement with published experimental data for conformational preferences of short alanine peptides and better accord with experimental NMR relaxation data of test protein systems.
Project description:BACKGROUND: Protein-DNA interactions are important for many cellular processes, however structural knowledge for a large fraction of known and putative complexes is still lacking. Computational docking methods aim at the prediction of complex architecture given detailed structures of its constituents. They are becoming an increasingly important tool in the field of macromolecular assemblies, complementing particularly demanding protein-nucleic acids X ray crystallography and providing means for the refinement and integration of low resolution data coming from rapidly advancing methods such as cryoelectron microscopy. RESULTS: We present a new coarse-grained force field suitable for protein-DNA docking. The force field is an extension of previously developed parameter sets for protein-RNA and protein-protein interactions. The docking is based on potential energy minimization in translational and orientational degrees of freedom of the binding partners. It allows for fast and efficient systematic search for native-like complex geometry without any prior knowledge regarding binding site location. CONCLUSIONS: We find that the force field gives very good results for bound docking. The quality of predictions in the case of unbound docking varies, depending on the level of structural deviation from bound geometries. We analyze the role of specific protein-DNA interactions on force field performance, both with respect to complex structure prediction, and the reproduction of experimental binding affinities. We find that such direct, specific interactions only partially contribute to protein-DNA recognition, indicating an important role of shape complementarity and sequence-dependent DNA internal energy, in line with the concept of indirect protein-DNA readout mechanism.
Project description:Protein folding is a hierarchical process where structure forms locally first, then globally. Some short sequence segments initiate folding through strong structural preferences that are independent of their three-dimensional context in proteins. We have constructed a knowledge-based force field in which the energy functions are conditional on local sequence patterns, as expressed in the hidden Markov model for local structure (HMMSTR). Carbon-alpha force field (CALF) builds sequence specific statistical potentials based on database frequencies for alpha-carbon virtual bond opening and dihedral angles, pair-wise contacts and hydrogen bond donor-acceptor pairs, and simulates folding via Brownian dynamics. We introduce hydrogen bond donor and acceptor potentials as alpha-carbon probability fields that are conditional on the predicted local sequence. Constant temperature simulations were carried out using 27 peptides selected as putative folding initiation sites, each 12 residues in length, representing several different local structure motifs. Each 0.6 micros trajectory was clustered based on structure. Simulation convergence or representativeness was assessed by subdividing trajectories and comparing clusters. For 21 of the 27 sequences, the largest cluster made up more than half of the total trajectory. Of these 21 sequences, 14 had cluster centers that were at most 2.6 A root mean square deviation (RMSD) from their native structure in the corresponding full-length protein. To assess the adequacy of the energy function on nonlocal interactions, 11 full length native structures were relaxed using Brownian dynamics simulations. Equilibrated structures deviated from their native states but retained their overall topology and compactness. A simple potential that folds proteins locally and stabilizes proteins globally may enable a more realistic understanding of hierarchical folding pathways.