Can molecular dynamics simulations help in discriminating correct from erroneous protein 3D models?
ABSTRACT: BACKGROUND: Recent approaches for predicting the three-dimensional (3D) structure of proteins such as de novo or fold recognition methods mostly rely on simplified energy potential functions and a reduced representation of the polypeptide chain. These simplifications facilitate the exploration of the protein conformational space but do not permit to capture entirely the subtle relationship that exists between the amino acid sequence and its native structure. It has been proposed that physics-based energy functions together with techniques for sampling the conformational space, e.g., Monte Carlo or molecular dynamics (MD) simulations, are better suited to the task of modelling proteins at higher resolutions than those of models obtained with the former type of methods. In this study we monitor different protein structural properties along MD trajectories to discriminate correct from erroneous models. These models are based on the sequence-structure alignments provided by our fold recognition method, FROST. We define correct models as being built from alignments of sequences with structures similar to their native structures and erroneous models from alignments of sequences with structures unrelated to their native structures. RESULTS: For three test sequences whose native structures belong to the all-alpha, all-beta and alphabeta classes we built a set of models intended to cover the whole spectrum: from a perfect model, i.e., the native structure, to a very poor model, i.e., a random alignment of the test sequence with a structure belonging to another structural class, including several intermediate models based on fold recognition alignments. We submitted these models to 11 ns of MD simulations at three different temperatures. We monitored along the corresponding trajectories the mean of the Root-Mean-Square deviations (RMSd) with respect to the initial conformation, the RMSd fluctuations, the number of conformation clusters, the evolution of secondary structures and the surface area of residues. None of these criteria alone is 100% efficient in discriminating correct from erroneous models. The mean RMSd, RMSd fluctuations, secondary structure and clustering of conformations show some false positives whereas the residue surface area criterion shows false negatives. However if we consider these criteria in combination it is straightforward to discriminate the two types of models. CONCLUSION: The ability of discriminating correct from erroneous models allows us to improve the specificity and sensitivity of our fold recognition method for a number of ambiguous cases.
Project description:For single-domain proteins, we examine the completeness of the structures in the current Protein Data Bank (PDB) library for use in full-length model construction of unknown sequences. To address this issue, we employ a comprehensive benchmark set of 1,489 medium-size proteins that cover the PDB at the level of 35% sequence identity and identify templates by structure alignment. With homologous proteins excluded, we can always find similar folds to native with an average rms deviation (RMSD) from native of 2.5 A with approximately 82% alignment coverage. These template structures often contain a significant number of insertions/deletions. The tasser algorithm was applied to build full-length models, where continuous fragments are excised from the top-scoring templates and reassembled under the guide of an optimized force field, which includes consensus restraints taken from the templates and knowledge-based statistical potentials. For almost all targets (except for 2/1,489), the resultant full-length models have an RMSD to native below 6 A (97% of them below 4 A). On average, the RMSD of full-length models is 2.25 A, with aligned regions improved from 2.5 A to 1.88 A, comparable with the accuracy of low-resolution experimental structures. Furthermore, starting from state-of-the-art structural alignments, we demonstrate a methodology that can consistently bring template-based alignments closer to native. These results are highly suggestive that the protein-folding problem can in principle be solved based on the current PDB library by developing efficient fold recognition algorithms that can recover such initial alignments.
Project description:Sampling enrichment toward a target state, an analogue of the improvement of sampling efficiency (SE), is critical in both the refinement of protein structures and the generation of near-native structure ensembles for the exploration of structure-function relationships. We developed a hybrid molecular dynamics (MD)-Monte Carlo (MC) approach to enrich the sampling toward the target structures. In this approach, the higher SE is achieved by perturbing the conventional MD simulations with a MC structure-acceptance judgment, which is based on the coincidence degree of small angle x-ray scattering (SAXS) intensity profiles between the simulation structures and the target structure. We found that the hybrid simulations could significantly improve SE by making the top-ranked models much closer to the target structures both in the secondary and tertiary structures. Specifically, for the 20 mono-residue peptides, when the initial structures had the root-mean-squared deviation (RMSD) from the target structure smaller than 7 Å, the hybrid MD-MC simulations afforded, on average, 0.83 Å and 1.73 Å in RMSD closer to the target than the parallel MD simulations at 310K and 370K, respectively. Meanwhile, the average SE values are also increased by 13.2% and 15.7%. The enrichment of sampling becomes more significant when the target states are gradually detectable in the MD-MC simulations in comparison with the parallel MD simulations, and provide >200% improvement in SE. We also performed a test of the hybrid MD-MC approach in the real protein system, the results showed that the SE for 3 out of 5 real proteins are improved. Overall, this work presents an efficient way of utilizing solution SAXS to improve protein structure prediction and refinement, as well as the generation of near native structures for function annotation.
Project description:Molecular dynamic (MD) simulations with both implicit and explicit solvent models have been carried out to study the folding dynamics of HP-36 protein. Starting from the extended conformation, the secondary structure of all three helices in HP-36 was formed in about 50 ns and remained stable in the remaining simulation. However, the formation of the tertiary structure was difficult. Although some intermediates were close to the native structure, the overall conformation was not stable. Further analysis revealed that the large structure fluctuation of loop and hydrophobic core regions was devoted mostly to the instability of the structure during MD simulation. The backbone root-mean-square deviation (RMSD) of the loop and hydrophobic core regions showed strong correlation with the backbone RMSD of the whole protein. The free energy landscape indicated that the distribution of main chain torsions in loop and turn regions was far away from the native state. Starting from an intermediate structure extracted from the initial AMBER simulation, HP-36 was found to generally fold to the native state under the dynamically adjusted polarized protein-specific charge (DPPC) simulation, while the peptide did not fold into the native structure when AMBER force filed was used. The two best folded structures were extracted and taken into further simulations in water employing AMBER03 charge and DPPC for 25 ns. Result showed that introducing polarization effect into interacting potential could stabilize the near-native protein structure.
Project description:Knowing atomistic details of proteins is essential not only for the understanding of protein function but also for the development of drugs. Experimental methods such as X-ray crystallography, NMR, and cryo-electron microscopy (cryo-EM) are the preferred forms of protein structure determination and have achieved great success over the most recent decades. Computational methods may be an alternative when experimental techniques fail. However, computational methods are severely limited when it comes to predicting larger macromolecule structures with little sequence similarity to known structures. The incorporation of experimental restraints in computational methods is becoming increasingly important to more reliably predict protein structure. One such experimental input used in structure prediction and refinement is cryo-EM densities. Recent advances in cryo-EM have arguably revolutionized the field of structural biology. Our previously developed cryo-EM-guided Rosetta-MD protocol has shown great promise in the refinement of soluble protein structures. In this study, we extended cryo-EM density-guided iterative Rosetta-MD to membrane proteins. We also improved the methodology in general by picking models based on a combination of their score and fit-to-density during the Rosetta model selection. By doing so, we have been able to pick models superior to those with the previous selection based on Rosetta score only and we have been able to further improve our previously refined models of soluble proteins. The method was tested with five membrane spanning protein structures. By applying density-guided Rosetta-MD iteratively we were able to refine the predicted structures of these membrane proteins to atomic resolutions. We also showed that the resolution of the density maps determines the improvement and quality of the refined models. By incorporating high-resolution density maps (?4 Å), we were able to more significantly improve the quality of the models than when medium-resolution maps (6.9 Å) were used. Beginning from an average starting structure root mean square deviation (RMSD) to native of 4.66 Å, our protocol was able to refine the structures to bring the average refined structure RMSD to 1.66 Å when 4 Å density maps were used. The protocol also successfully refined the HIV-1 CTD guided by an experimental 5 Å density map.
Project description:The first important step in a structure-based virtual screening is the judicious selection of a receptor protein. In cases where the holo protein receptor structure is unavailable, significant reduction in virtual screening performance has been reported. In this work, we present a robust method to generate reliable holo protein structure conformations from apo structures using molecular dynamics (MD) simulation with restraints derived from holo structure binding-site templates. We perform benchmark tests on two different datasets: 40 structures from a directory of useful decoy-enhanced (DUD-E) and 84 structures from the Gunasekaran dataset. Our results show successful refinement of apo binding-site structures toward holo conformations in 82% of the test cases. In addition, virtual screening performance of 40 DUD-E structures is significantly improved using our MD-refined structures as receptors with an average enrichment factor (EF), an EF<sub>1%</sub> value of 6.2 compared to apo structures with 3.5. Docking of native ligands to the refined structures shows an average ligand root mean square deviation (RMSD) of 1.97 Å (DUD-E dataset and Gunasekaran dataset) relative to ligands in the holo crystal structures, which is comparable to the self-docking (i.e., docking of the native ligand back to its crystal structure receptor) average, 1.34 Å (DUD-E dataset) and 1.36 Å (Gunasekaran dataset). On the other hand, docking to the apo structures yields an average ligand RMSD of 3.65 Å (DUD-E) and 2.90 Å (Gunasekaran). These results indicate that our method is robust and can be useful to improve virtual screening performance of apo structures.
Project description:In this study, we examined the folding processes of eight helical proteins (2I9M, TC5B, 1WN8, 1V4Z, 1HO2, 1HLL, 2KFE, and 1YYB) at room temperature using the explicit solvent model under the AMBER14SB force field with the accelerated molecular dynamics (AMD) and traditional molecular dynamics (MD), respectively. We analyzed and compared the simulation results obtained by these two methods based on several aspects, such as root mean square deviation (RMSD), native contacts, cluster analysis, folding snapshots, free energy landscape, and the evolution of the radius of gyration, which showed that these eight proteins were successfully and consistently folded into the corresponding native structures by AMD simulations carried out at room temperature. In addition, the folding occurred in the range of 40~180 ns after starting from the linear structures of the eight proteins at 300 K. By contrast, these stable folding structures were not found when the traditional molecular dynamics (MD) simulation was used. At the same time, the influence of high temperatures (350, 400, and 450 K) is also further investigated. Study found that the simulation efficiency of AMD is higher than that of MD simulations, regardless of the temperature. Of these temperatures, 300 K is the most suitable temperature for protein folding for all systems. To further investigate the efficiency of AMD, another trajectory was simulated for eight proteins with the same linear structure but different random seeds at 300 K. Both AMD trajectories reached the correct folded structures. Our result clearly shows that AMD simulation are a highly efficient and reliable method for the study of protein folding.
Project description:The brevity of molecular dynamics simulations often limits their utility in developing and evaluating structural models of proteins. The duration of simulations can be increased greatly using discrete molecular dynamics (DMD). However, the trade off is that coarse graining, implicit solvent, and other time-saving procedures reduce the accuracy of DMD simulations. Here we address some of these issues by comparing results of DMD and conventional all atom MD simulations on proteins of known structure and misfolded proteins. DMD simulations were performed at a range of temperatures to identify a 'physiological' temperature for DMD that mimicked molecular motions of conventional MD simulations at 310K. We also compared results obtained with a new implicit solvent model developed here based on Miyazawa-Jernigan interaction pair potential to those obtained with a previously used model based on Kyte-Doolittle hydropathy scale. We compared DMD and all atom molecular dynamics with explicit water by simulating both correctly and incorrectly folded structures, and monomeric and dimeric ? ?-barrel structures to analyze the ability of these procedures to distinguish between good and bad models. Deviations from the correct structures were substantially greater with DMD, as would be expected from coarse-graining and longer simulation time. Deviations were smallest for ?-strands and greatest for coiled loops. Structures of the incorrectly folded models were very poorly preserved during the DMD simulations; but both methods were able to distinguish between the correct and the incorrect structures based on differences in the magnitudes of the root mean squared deviation (RMSD) from the starting conformation.
Project description:Computational protein tertiary structure prediction has made significant progress over the past years. However, most of the existing structure prediction methods are not equipped with functionality to predict accuracy of constructed models. Knowing the accuracy of a structure model is crucial for its practical use since the accuracy determines potential applications of the model. Here we have developed quality assessment methods, which predict real value of the global and local quality of protein structure models. The global quality of a model is defined as the root mean square deviation (RMSD) and the LGA score to its native structure. The local quality is defined as the distance between the corresponding Calpha positions of a model and its native structure when they are superimposed. Three regression methods are employed to combine different types of quality assessment measures of models, including alignment-level scores, residue-position level scores, atomic-detailed structure level scores and composite scores. The regression models were tested on a large benchmark data set of template-based protein structure models of various qualities. In predicting RMSD and the LGA score, a combination of two terms, length-normalized SPAD, a score that assesses alignment stability by considering suboptimal alignments, and Verify3D normalized by the square of the model length shows a significant performance, achieving 97.1 and 83.6% accuracy in identifying models with an RMSD of <2 and 6 A, respectively. For predicting the local quality of models, we find that a two-step approach, in which the global RMSD predicted in the first step is further combined with the other terms, can dramatically increase the accuracy. Finally, the developed regression equations are applied to assess the quality of structure models of whole E. coli proteome.
Project description:A protocol is presented for the global refinement of homology models of proteins. It combines the advantages of temperature-based replica-exchange molecular dynamics (REMD) for conformational sampling and the use of statistical potentials for model selection. The protocol was tested using 21 models. Of these 14 were models of 10 small proteins for which high-resolution crystal structures were available, the remainder were targets of the recent CASPR exercise. It was found that REMD in combination with currently available force fields could sample near-native conformational states starting from high-quality homology models. Conformations in which the backbone RMSD of secondary structure elements (SSE-RMSD) was lower than the starting value by 0.5-1.0 A were found for 15 out of the 21 cases (average 0.82 A). Furthermore, when a simple scoring function consisting of two statistical potentials was used to rank the structures, one or more structures with SSE-RMSD of at least 0.2 A lower than the starting value was found among the five best ranked structures in 11 out of the 21 cases. The average improvement in SSE-RMSD for the best models was 0.42 A. However, none of the scoring functions tested identified the structures with the lowest SSE-RMSD as the best models although all identified the native conformation as the one with lowest energy. This suggests that while the proposed protocol proved effective for the refinement of high-quality models of small proteins scoring functions remain one of the major limiting factors in structure refinement. This and other aspects by which the methodology could be further improved are discussed.
Project description:One of critical difficulties of molecular dynamics (MD) simulations in protein structure refinement is that the physics-based energy landscape lacks a middle-range funnel to guide nonnative conformations toward near-native states. We propose to use the target model as a probe to identify fragmental analogs from PDB. The distance maps are then used to reshape the MD energy funnel. The protocol was tested on 181 benchmarking and 26 CASP targets. It was found that structure models of correct folds with TM-score >0.5 can be often pulled closer to native with higher GDT-HA score, but improvement for the models of incorrect folds (TM-score <0.5) are much less pronounced. These data indicate that template-based fragmental distance maps essentially reshaped the MD energy landscape from golf-course-like to funnel-like ones in the successfully refined targets with a radius of TM-score ?0.5. These results demonstrate a new avenue to improve high-resolution structures by combining knowledge-based template information with physics-based MD simulations.