How do potentials derived from structural databases relate to "true" potentials?
ABSTRACT: Knowledge-based potentials are used widely in protein folding and inverse folding algorithms. Two kinds of derivation methods are used. (1) The interactions in a database of known protein structures are assumed to obey a Boltzmann distribution. (2) The stability of the native folds relative to a manifold of misfolded structures is optimized. Here, a set of previously derived contact and secondary structure propensity potentials, taken as the "true" potentials, are employed to construct an artificial protein structural database from protein fragments. Then, new sets of potentials are derived to see how they are related to the true potentials. Using the Boltzmann distribution method, when the stability of the structures in the database lies within a certain range, both contact potentials and secondary structure propensities can be derived separately with remarkable accuracy. In general, the optimization method was found to be less accurate due to errors in the "excess energy" contribution. When the excess energy terms are kept as a constraint, the true potentials are recovered exactly.
Project description:Recent advances in understanding protein folding have benefitted from coarse-grained representations of protein structures. Empirical energy functions derived from these techniques occasionally succeed in distinguishing native structures from their corresponding ensembles of nonnative folds or decoys which display varying degrees of structural dissimilarity to the native proteins. Here we utilized atomic coordinates of single protein chains, comprising a large diverse training set, to develop and evaluate twelve all-atom four-body statistical potentials obtained by exploring alternative values for a pair of inherent parameters. Delaunay tessellation was performed on the atomic coordinates of each protein to objectively identify all quadruplets of interacting atoms, and atomic potentials were generated via statistical analysis of the data and implementation of the inverted Boltzmann principle. Our potentials were evaluated using benchmarking datasets from Decoys-'R'-Us, and comparisons were made with twelve other physics- and knowledge-based potentials. Ranking 3rd, our best potential tied CHARMM19 and surpassed AMBER force field potentials. We illustrate how a generalized version of our potential can be used to empirically calculate binding energies for target-ligand complexes, using HIV-1 protease-inhibitor complexes for a practical application. The combined results suggest an accurate and efficient atomic four-body statistical potential for protein structure prediction and assessment.
Project description:A method for deriving all-atom protein folding potentials is presented and tested on a three-helix bundle protein, as well as on hairpin and helical sequences. The potentials obtained are composed of a contact term between pairs of atoms, and a local density term for each atom, mimicking solvent exposure preferences. Using this potential in an all-atom protein folding simulation, we repeatedly folded the three-helix bundle, with the lowest energy conformations having a C(alpha) distance rms from the native structure of less than 2 A. Similar results were obtained for the hairpin and helices by using different potentials. We derived potentials for several different proteins and found a high correlation between the derived parameters, suggesting that a potential of this form eventually could be found that folds multiple, unrelated proteins at the atomic level of detail.
Project description:BACKGROUND: For over 30 years potentials of mean force have been used to evaluate the relative energy of protein structures. The most commonly used potentials define the energy of residue-residue interactions and are derived from the empirical analysis of the known protein structures. However, single-body residue 'environment' potentials, although widely used in protein structure analysis, have not been rigorously compared to these classical two-body residue-residue interaction potentials. Here we do not try to combine the two different types of residue interaction potential, but rather to assess their independent contribution to scoring protein structures. RESULTS: A data set of nearly three thousand monomers was used to compare pairwise residue-residue 'contact-type' propensities to single-body residue 'contact-count' propensities. Using a large and standard set of protein decoys we performed an in-depth comparison of these two types of residue interaction propensities. The scores derived from the contact-type and contact-count propensities were assessed using two different performance metrics and were compared using 90 different definitions of residue-residue contact. Our findings show that both types of score perform equally well on the task of discriminating between near-native protein decoys. However, in a statistical sense, the contact-count based scores were found to carry more information than the contact-type based scores. CONCLUSION: Our analysis has shown that the performance of either type of score is very similar on a range of different decoys. This similarity suggests a common underlying biophysical principle for both types of residue interaction propensity. However, several features of the contact-count based propensity suggests that it should be used in preference to the contact-type based propensity. Specifically, it has been shown that contact-counts can be predicted from sequence information alone. In addition, the use of a single-body term allows for efficient alignment strategies using dynamic programming, which is useful for fold recognition, for example. These facts, combined with the relative simplicity of the contact-count propensity, suggests that contact-counts should be studied in more detail in the future.
Project description:We propose a novel and flexible derivation scheme of statistical, database-derived, potentials, which allows one to take simultaneously into account specific correlations between several sequence and structure descriptors. This scheme leads to the decomposition of the total folding free energy of a protein into a sum of lower order terms, thereby giving the possibility to analyze independently each contribution and clarify its significance and importance, to avoid overcounting certain contributions, and to deal more efficiently with the limited size of the database. In addition, this derivation scheme appears as quite general, for many previously developed potentials can be expressed as particular cases of our formalism. We use this formalism as a framework to generate different residue-based energy functions, whose performances are assessed on the basis of their ability to discriminate genuine proteins from decoy models. The optimal potential is generated as a combination of several coupling terms, measuring correlations between residue types, backbone torsion angles, solvent accessibilities, relative positions along the sequence, and interresidue distances. This potential outperforms all tested residue-based potentials, and even several atom-based potentials. Its incorporation in algorithms aiming at predicting protein structure and stability should therefore substantially improve their performances.
Project description:We develop coarse-grained, distance- and orientation-dependent statistical potentials from the growing protein structural databases. For protein structural classes (alpha, beta, and alpha/beta), a substantial number of backbone-backbone and backbone-side-chain contacts stabilize the native folds. By taking into account the importance of backbone interactions with a virtual backbone interaction center as the 21st anisotropic site, we construct a 21 x 21 interaction scheme. The new potentials are studied using spherical harmonics analysis (SHA) and a smooth, continuous version is constructed using spherical harmonic synthesis (SHS). Our approach has the following advantages: (1) The smooth, continuous form of the resulting potentials is more realistic and presents significant advantages for computational simulations, and (2) with SHS, the potential values can be computed efficiently for arbitrary coordinates, requiring only the knowledge of a few spherical harmonic coefficients. The performance of the new orientation-dependent potentials was tested using a standard database of decoy structures. The results show that the ability of the new orientation-dependent potentials to recognize native protein folds from a set of decoy structures is strongly enhanced by the inclusion of anisotropic backbone interaction centers. The anisotropic potentials can be used to develop realistic coarse-grained simulations of proteins, with direct applications to protein design, folding, and aggregation.
Project description:We introduce a new type of knowledge-based potentials for protein structure prediction, called 'evolutionary potentials', which are derived using a single experimental protein structure and all three-dimensional models of its homologous sequences. The new potentials have been benchmarked against other knowledge-based potentials, resulting in a significant increase in accuracy for model assessment. In contrast to standard knowledge-based potentials, we propose that evolutionary potentials capture key determinants of thermodynamic stability and specific sequence constraints required for fast folding.
Project description:A residue-based and a heavy atom-based statistical pair potential are developed for use in assessing the strength of protein-protein interactions. To ensure the quality of the potentials, a nonredundant, high-quality dimer database is constructed. The protein complexes in this dataset are checked by a literature search to confirm that they form multimers, and the pairwise amino acid preference to interact across a protein-protein interface is analyzed and pair potentials constructed. The performance of the residue-based potentials is evaluated by using four jackknife tests and by assessing the potentials' ability to select true protein-protein interfaces from false ones. Compared to potentials developed for monomeric protein structure prediction, the interdomain potential performs much better at distinguishing protein-protein interactions. The potential developed from homodimer interfaces is almost the same as that developed from heterodimer interfaces with a correlation coefficient of 0.92. The residue-based potential is well suited for genomic scale protein interaction prediction and analysis, such as in a recently developed threading-based algorithm, MULTIPROSPECTOR. However, the more time-consuming atom-based potential performs better in identifying near-native structures from docking generated decoys.
Project description:BACKGROUND: Accurate description of protein interaction with aqueous solvent is crucial for modeling of protein folding, protein-protein interaction, and drug design. Efforts to build a working description of solvation, both by continuous models and by molecular dynamics, yield controversial results. Specifically constructed knowledge-based potentials appear to be promising for accounting for the solvation at the molecular level, yet have not been used for this purpose. RESULTS: We developed original knowledge-based potentials to study protein hydration at the level of atom contacts. The potentials were obtained using a new Monte Carlo reference state (MCRS), which simulates the expected probability density of atom-atom contacts via exhaustive sampling of structure space with random probes. Using the MCRS allowed us to calculate the expected atom contact densities with high resolution over a broad distance range including very short distances. Knowledge-based potentials for hydration of protein atoms of different types were obtained based on frequencies of their contacts at different distances with protein-bound water molecules, in a non-redundant training data base of 1776 proteins with known 3D structures. Protein hydration sites were predicted in a test set of 12 proteins with experimentally determined water locations. The MCRS greatly improves prediction of water locations over existing methods. In addition, the contribution of the energy of macromolecular solvation into total folding free energy was estimated, and tested in fold recognition experiments. The correct folds were preferred over all the misfolded decoys for the majority of proteins from the improved Rosetta decoy set based on the structure hydration energy alone. CONCLUSION: MCRS atomic hydration potentials provide a detailed distance-dependent description of hydropathies of individual protein atoms. This allows placement of water molecules on the surface of proteins and in protein interfaces with much higher precision. The potentials provide a means to estimate the total solvation energy for a protein structure, in many cases achieving a successful fold recognition. Possible applications of atomic hydration potentials to structure verification, protein folding and stability, and protein-protein interactions are discussed.
Project description:Many statistical potentials were developed in last two decades for protein folding and protein structure recognition. The major difference of these potentials is on the selection of reference states to offset sampling bias. However, since these potentials used different databases and parameter cutoffs, it is difficult to judge what the best reference states are by examining the original programs. In this study, we aim to address this issue and evaluate the reference states by a unified database and programming environment. We constructed distance-specific atomic potentials using six widely-used reference states based on 1022 high-resolution protein structures, which are applied to rank modeling in six sets of structure decoys. The reference state on random-walk chain outperforms others in three decoy sets while those using ideal-gas, quasi-chemical approximation and averaging sample stand out in one set separately. Nevertheless, the performance of the potentials relies on the origin of decoy generations and no reference state can clearly outperform others in all decoy sets. Further analysis reveals that the statistical potentials have a contradiction between the universality and pertinence, and optimal reference states should be extracted based on specific application environments and decoy spaces.
Project description:We have analyzed 29 different published matrices of protein pairwise contact potentials (CPs) between amino acids derived from different sets of proteins, either crystallographic structures taken from the Protein Data Bank (PDB) or computer-generated decoys. Each of the CPs is similar to 1 of the 2 matrices derived in the work of Miyazawa and Jernigan (Proteins 1999;34:49-68). The CP matrices of the first class can be approximated with a correlation of order 0.9 by the formula e(ij) = h(i) + h(j), 1 <or= i, j <or= 20, where the residue-type dependent factor h is highly correlated with the frequency of occurrence of a given amino acid type inside proteins. Electrostatic interactions for the potentials of this class are almost negligible. In the potentials belonging to this class, the major contribution to the potentials is the one-body transfer energy of the amino acid from water to the protein environment. Potentials belonging to the second class can be approximated with a correlation of 0.9 by the formula e(ij) = c(0) - h(i)h(j) + q(i)q(j), where c(0) is a constant, h is highly correlated with the Kyte-Doolittle hydrophobicity scale, and a new, less dominant, residue-type dependent factor q is correlated ( approximately 0.9) with amino acid isoelectric points pI. Including electrostatic interactions significantly improves the approximation for this class of potentials. While, the high correlation between potentials of the first class and the hydrophobic transfer energies is well known, the fact that this approximation can work well also for the second class of potentials is a new finding. We interpret potentials of this class as representing energies of contact of amino acid pairs within an average protein environment.