A free-energy approach for all-atom protein simulation.
ABSTRACT: All-atom free-energy methods offer a promising alternative to kinetic molecular mechanics simulations of protein folding and association. Here we report an accurate, transferable all-atom biophysical force field (PFF02) that stabilizes the native conformation of a wide range of proteins as the global optimum of the free-energy landscape. For 32 proteins of the ROSETTA decoy set and six proteins that we have previously folded with PFF01, we find near-native conformations with an average backbone RMSD of 2.14 A to the native conformation and an average Z-score of -3.46 to the corresponding decoy set. We used nonequilibrium sampling techniques starting from completely extended conformations to exhaustively sample the energy surface of three nonhomologous hairpin-peptides, a three-stranded beta-sheet, the all-helical 40 amino-acid HIV accessory protein, and a zinc-finger beta beta alpha motif, and find near-native conformations for the minimal energy for each protein. Using a massively parallel evolutionary algorithm, we also obtain a near-native low-energy conformation for the 54 amino-acid engrailed homeodomain. Our force field thus stabilized near-native conformations for a total of 20 proteins of all structure classes with an average RMSD of only 3.06 A to their respective experimental conformations.
Project description:BACKGROUND: The reliable prediction of protein tertiary structure from the amino acid sequence remains challenging even for small proteins. We have developed an all-atom free-energy protein forcefield (PFF01) that we could use to fold several small proteins from completely extended conformations. Because the computational cost of de-novo folding studies rises steeply with system size, this approach is unsuitable for structure prediction purposes. We therefore investigate here a low-cost free-energy relaxation protocol for protein structure prediction that combines heuristic methods for model generation with all-atom free-energy relaxation in PFF01. RESULTS: We use PFF01 to rank and cluster the conformations for 32 proteins generated by ROSETTA. For 22/10 high-quality/low quality decoy sets we select near-native conformations with an average Calpha root mean square deviation of 3.03 A/6.04 A. The protocol incorporates an inherent reliability indicator that succeeds for 78% of the decoy sets. In over 90% of these cases near-native conformations are selected from the decoy set. This success rate is rationalized by the quality of the decoys and the selectivity of the PFF01 forcefield, which ranks near-native conformations an average 3.06 standard deviations below that of the relaxed decoys (Z-score). CONCLUSION: All-atom free-energy relaxation with PFF01 emerges as a powerful low-cost approach toward generic de-novo protein structure prediction. The approach can be applied to large all-atom decoy sets of any origin and requires no preexisting structural information to identify the native conformation. The study provides evidence that a large class of proteins may be foldable by PFF01.
Project description:BACKGROUND:A key component in protein structure prediction is a scoring or discriminatory function that can distinguish near-native conformations from misfolded ones. Various types of scoring functions have been developed to accomplish this goal, but their performance is not adequate to solve the structure selection problem. In addition, there is poor correlation between the scores and the accuracy of the generated conformations. RESULTS:We present a simple and nonparametric formula to estimate the accuracy of predicted conformations (or decoys). This scoring function, called the density score function, evaluates decoy conformations by performing an all-against-all Calpha RMSD (Root Mean Square Deviation) calculation in a given decoy set. We tested the density score function on 83 decoy sets grouped by their generation methods (4state_reduced, fisa, fisa_casp3, lmds, lattice_ssfit, semfold and Rosetta). The density scores have correlations as high as 0.9 with the Calpha RMSDs of the decoy conformations, measured relative to the experimental conformation for each decoy. We previously developed a residue-specific all-atom probability discriminatory function (RAPDF), which compiles statistics from a database of experimentally determined conformations, to aid in structure selection. Here, we present a decoy-dependent discriminatory function called self-RAPDF, where we compiled the atom-atom contact probabilities from all the conformations in a decoy set instead of using an ensemble of native conformations, with a weighting scheme based on the density scores. The self-RAPDF has a higher correlation with Calpha RMSD than RAPDF for 76/83 decoy sets, and selects better near-native conformations for 62/83 decoy sets. Self-RAPDF may be useful not only for selecting near-native conformations from decoy sets, but also for fold simulations and protein structure refinement. CONCLUSIONS:Both the density score and the self-RAPDF functions are decoy-dependent scoring functions for improved protein structure selection. Their success indicates that information from the ensemble of decoy conformations can be used to derive statistical probabilities and facilitate the identification of near-native structures.
Project description:BACKGROUND:We present a simple method to train a potential function for the protein folding problem which, even though trained using a small number of proteins, is able to place a significantly large number of native conformations near a local minimum. The training relies on generating decoys by energy minimization of the native conformations using the current potential and using a physically meaningful objective function (derivative of energy with respect to torsion angles at the native conformation) during the quadratic programming to place the native conformation near a local minimum. RESULTS:We also compare the performance of three different types of energy functions and find that while the pairwise energy function is trainable, a solvation energy function by itself is untrainable if decoys are generated by minimizing the current potential starting at the native conformation. The best results are obtained when a pairwise interaction energy function is used with solvation energy function. CONCLUSIONS:We are able to train a potential function using six proteins which places a total of 42 native conformations within approximately 4 A rmsd and 71 native conformations within approximately 6 A rmsd of a local minimum out of a total of 91 proteins. Furthermore, the threading test using the same 91 proteins ranks 89 native conformations to be first and the other two as second.
Project description:How to refine a near-native structure to make it closer to its native conformation is an unsolved problem in protein-structure and protein-protein complex-structure prediction. In this article, we first test several scoring functions for selecting locally resampled near-native protein-protein docking conformations and then propose a computationally efficient protocol for structure refinement via local resampling and energy minimization. The proposed method employs a statistical energy function based on a Distance-scaled Ideal-gas REference state (DFIRE) as an initial filter and an empirical energy function EMPIRE (EMpirical Protein-InteRaction Energy) for optimization and re-ranking. Significant improvement of final top-1 ranked structures over initial near-native structures is observed in the ZDOCK 2.3 decoy set for Benchmark 1.0 (74% whose global rmsd reduced by 0.5 A or more and only 7% increased by 0.5 A or more). Less significant improvement is observed for Benchmark 2.0 (38% versus 33%). Possible reasons are discussed.
Project description:Loops in proteins are flexible regions connecting regular secondary structures. They are often involved in protein functions through interacting with other molecules. The irregularity and flexibility of loops make their structures difficult to determine experimentally and challenging to model computationally. Conformation sampling and energy evaluation are the two key components in loop modeling. We have developed a new method for loop conformation sampling and prediction based on a chain growth sequential Monte Carlo sampling strategy, called Distance-guided Sequential chain-Growth Monte Carlo (DISGRO). With an energy function designed specifically for loops, our method can efficiently generate high quality loop conformations with low energy that are enriched with near-native loop structures. The average minimum global backbone RMSD for 1,000 conformations of 12-residue loops is 1:53 A° , with a lowest energy RMSD of 2:99 A° , and an average ensembleRMSD of 5:23 A° . A novel geometric criterion is applied to speed up calculations. The computational cost of generating 1,000 conformations for each of the x loops in a benchmark dataset is only about 10 cpu minutes for 12-residue loops, compared to ca 180 cpu minutes using the FALCm method. Test results on benchmark datasets show that DISGRO performs comparably or better than previous successful methods, while requiring far less computing time. DISGRO is especially effective in modeling longer loops (10-17 residues).
Project description:In this paper, we introduce a method to account for the shape of the potential energy curve in the evaluation of conformational free energies. The method is based on a procedure that generates a set of conformations, each with its own force-field energy, but adds a term to this energy that favors conformations that are close in structure (have a low rmsd) to other conformations. The sum of the force-field energy and rmsd-dependent term is defined here as the "colony energy" of a given conformation, because each conformation that is generated is viewed as representing a colony of points. The use of the colony energy tends to select conformations that are located in broad energy basins. The approach is applied to the ab initio prediction of the conformations of all of the loops in a dataset of 135 nonredundant proteins. By using an rmsd from a native criterion based on the superposition of loop stems, the average rmsd of 5-, 6-, 7-, and 8-residue long loops is 0.85, 0.92, 1.23, and 1.45 A, respectively. For 8-residue loops, 60 of 61 predictions have an rmsd of less than 3.0 A. The use of the colony energy is found to improve significantly the results obtained from the potential function alone. (The loop prediction program, "Loopy," can be downloaded at http://trantor.bioc.columbia.edu.)
Project description:BACKGROUND:Computational approaches for the determination of biologically-active/native three-dimensional structures of proteins with novel sequences have to handle several challenges. The (conformation) space of possible three-dimensional spatial arrangements of the chain of amino acids that constitute a protein molecule is vast and high-dimensional. Exploration of the conformation spaces is performed in a sampling-based manner and is biased by the internal energy that sums atomic interactions. Even state-of-the-art energy functions that quantify such interactions are inherently inaccurate and associate with protein conformation spaces overly rugged energy surfaces riddled with artifact local minima. The response to these challenges in template-free protein structure prediction is to generate large numbers of low-energy conformations (also referred to as decoys) as a way of increasing the likelihood of having a diverse decoy dataset that covers a sufficient number of local minima possibly housing near-native conformations. RESULTS:In this paper we pursue a complementary approach and propose to directly control the diversity of generated decoys. Inspired by hard optimization problems in high-dimensional and non-linear variable spaces, we propose that conformation sampling for decoy generation is more naturally framed as a multi-objective optimization problem. We demonstrate that mechanisms inherent to evolutionary search techniques facilitate such framing and allow balancing multiple objectives in protein conformation sampling. We showcase here an operationalization of this idea via a novel evolutionary algorithm that has high exploration capability and is also able to access lower-energy regions of the energy landscape of a given protein with similar or better proximity to the known native structure than several state-of-the-art decoy generation algorithms. CONCLUSIONS:The presented results constitute a promising research direction in improving decoy generation for template-free protein structure prediction with regards to balancing of multiple conflicting objectives under an optimization framework. Future work will consider additional optimization objectives and variants of improvement and selection operators to apportion a fixed computational budget. Of particular interest are directions of research that attenuate dependence on protein energy models.
Project description:Although proteins are a fundamental unit in biology, the mechanism by which proteins fold into their native state is not well understood. In this work, we explore the assembly of secondary structure units via geometric constraint-based simulations and the effect of refinement of assembled structures using reservoir replica exchange molecular dynamics. Our approach uses two crucial features of these methods: i), geometric simulations speed up the search for nativelike topologies as there are no energy barriers to overcome; and ii), molecular dynamics identifies the low free energy structures and further refines these structures toward the actual native conformation. We use eight alpha-, beta-, and alpha/beta-proteins to test our method. The geometric simulations of our test set result in an average RMSD from native of 3.7 A and this further reduces to 2.7 A after refinement. We also explore the question of robustness of assembly for inaccurate (shifted and shortened) secondary structure. We find that the RMSD from native is highly dependent on the accuracy of secondary structure input, and even slightly shifting the location of secondary structure along the amino acid sequence can lead to a rapid decrease in RMSD to native due to incorrect packing.
Project description:A protocol is presented for the global refinement of homology models of proteins. It combines the advantages of temperature-based replica-exchange molecular dynamics (REMD) for conformational sampling and the use of statistical potentials for model selection. The protocol was tested using 21 models. Of these 14 were models of 10 small proteins for which high-resolution crystal structures were available, the remainder were targets of the recent CASPR exercise. It was found that REMD in combination with currently available force fields could sample near-native conformational states starting from high-quality homology models. Conformations in which the backbone RMSD of secondary structure elements (SSE-RMSD) was lower than the starting value by 0.5-1.0 A were found for 15 out of the 21 cases (average 0.82 A). Furthermore, when a simple scoring function consisting of two statistical potentials was used to rank the structures, one or more structures with SSE-RMSD of at least 0.2 A lower than the starting value was found among the five best ranked structures in 11 out of the 21 cases. The average improvement in SSE-RMSD for the best models was 0.42 A. However, none of the scoring functions tested identified the structures with the lowest SSE-RMSD as the best models although all identified the native conformation as the one with lowest energy. This suggests that while the proposed protocol proved effective for the refinement of high-quality models of small proteins scoring functions remain one of the major limiting factors in structure refinement. This and other aspects by which the methodology could be further improved are discussed.
Project description:BACKGROUND: The use of knowledge-based potential function is a powerful method for protein structure evaluation. A variety of formulations that evaluate single or multiple structural features of proteins have been developed and studied. The performance of functions is often evaluated by discrimination ability using decoy structures of target proteins. A function that can evaluate coarse-grained structures is advantageous from many aspects, such as relatively easy generation and manipulation of model structures; however, the reduction of structural representation is often accompanied by degradation of the structure discrimination performance. RESULTS: We developed a knowledge-based pseudo-energy calculating function for protein structure discrimination. The function (Discriminating Function using Main-chain Atom Coordinates, DFMAC) consists of six pseudo-energy calculation components that deal with different structural features. Only the main-chain atom coordinates of N, C alpha, and C atoms for the respective amino acid residues are required as input data for structure evaluation. The 231 target structures in 12 different types of decoy sets were separated into 154 and 77 targets, and function training and the subsequent performance test were performed using the respective target sets. Fifty-nine (76.6%) native and 68 (88.3%) near-native (< 2.0 A C alpha RMSD) targets in the test set were successfully identified. The average C alpha RMSD of the test set resulted in 1.174 with the tuned parameters. The major part of the discrimination performance was supported by the orientation-dependent component. CONCLUSION: Despite the reduced representation of input structures, DFMAC showed considerable structure discrimination ability. The function can be applied to the identification of near-native structures in structure prediction experiments.