ABSTRACT: We have developed a non-redundant protein-RNA binding benchmark dataset derived from the available protein-RNA structures in the Protein Database Bank. It consists of 73 complexes with measured binding affinity. The experimental conditions (pH and temperature) for binding affinity measurements are also listed in our dataset. This binding affinity dataset can be used to compare and develop protein-RNA scoring functions. The predicted binding free energy of the 73 complexes from three available scoring functions for protein-RNA docking has a low correlation with the binding Gibbs free energy calculated from Kd.
Project description:Protein-RNA interactions play an important role in many biological processes. The ability to predict the molecular structures of protein-RNA complexes from docking would be valuable for understanding the underlying chemical mechanisms. We have developed a novel nonredundant benchmark dataset for protein-RNA docking and scoring. The diverse dataset of 72 targets consists of 52 unbound-unbound test complexes, and 20 unbound-bound test complexes. Here, unbound-unbound complexes refer to cases in which both binding partners of the cocrystallized complex are either in apo form or in a conformation taken from a different protein-RNA complex, whereas unbound-bound complexes are cases in which only one of the two binding partners has another experimentally determined conformation. The dataset is classified into three categories according to the interface root mean square deviation and the percentage of native contacts in the unbound structures: 49 easy, 16 medium, and 7 difficult targets. The bound and unbound cases of the benchmark dataset are expected to benefit the development and improvement of docking and scoring algorithms for the docking community. All the easy-to-view structures are freely available to the public at http://zoulab.dalton.missouri.edu/RNAbenchmark/.
Project description:BACKGROUND: Current scoring functions are not very successful in protein-ligand binding affinity prediction albeit their popularity in structure-based drug designs. Here, we propose a general knowledge-guided scoring (KGS) strategy to tackle this problem. Our KGS strategy computes the binding constant of a given protein-ligand complex based on the known binding constant of an appropriate reference complex. A good training set that includes a sufficient number of protein-ligand complexes with known binding data needs to be supplied for finding the reference complex. The reference complex is required to share a similar pattern of key protein-ligand interactions to that of the complex of interest. Thus, some uncertain factors in protein-ligand binding may cancel out, resulting in a more accurate prediction of absolute binding constants. RESULTS: In our study, an automatic algorithm was developed for summarizing key protein-ligand interactions as a pharmacophore model and identifying the reference complex with a maximal similarity to the query complex. Our KGS strategy was evaluated in combination with two scoring functions (X-Score and PLP) on three test sets, containing 112 HIV protease complexes, 44 carbonic anhydrase complexes, and 73 trypsin complexes, respectively. Our results obtained on crystal structures as well as computer-generated docking poses indicated that application of the KGS strategy produced more accurate predictions especially when X-Score or PLP alone did not perform well. CONCLUSIONS: Compared to other targeted scoring functions, our KGS strategy does not require any re-parameterization or modification on current scoring methods, and its application is not tied to certain systems. The effectiveness of our KGS strategy is in theory proportional to the ever-increasing knowledge of experimental protein-ligand binding data. Our KGS strategy may serve as a more practical remedy for current scoring functions to improve their accuracy in binding affinity prediction.
Project description:A central problem in de novo drug design is determining the binding affinity of a ligand with a receptor. A new scoring algorithm is presented that estimates the binding affinity of a protein-ligand complex given a three-dimensional structure. The method, LISA (Ligand Identification Scoring Algorithm), uses an empirical scoring function to describe the binding free energy. Interaction terms have been designed to account for van der Waals (VDW) contacts, hydrogen bonding, desolvation effects, and metal chelation to model the dissociation equilibrium constants using a linear model. Atom types have been introduced to differentiate the parameters for VDW, H-bonding interactions, and metal chelation between different atom pairs. A training set of 492 protein-ligand complexes was selected for the fitting process. Different test sets have been examined to evaluate its ability to predict experimentally measured binding affinities. By comparing with other well-known scoring functions, the results show that LISA has advantages over many existing scoring functions in simulating protein-ligand binding affinity, especially metalloprotein-ligand binding affinity. Artificial Neural Network (ANN) was also used in order to demonstrate that the energy terms in LISA are well designed and do not require extra cross terms.
Project description:Protein-RNA interactions play essential roles in many biological aspects. Quantifying the binding affinity of protein-RNA complexes is helpful to the understanding of protein-RNA recognition mechanisms and identification of strong binding partners. Due to experimentally measured protein-RNA binding affinity data available is still limited to date, there is a pressing demand for accurate and reliable computational approaches. In this paper, we propose a computational approach, PredPRBA, which can effectively predict protein-RNA binding affinity using gradient boosted regression trees. We build a dataset of protein-RNA binding affinity that includes 103 protein-RNA complex structures manually collected from related literature. Then, we generate 37 kinds of sequence and structural features and explore the relationship between the features and protein-RNA binding affinity. We find that the binding affinity mainly depends on the structure of RNA molecules. According to the type of RNA associated with proteins composed of the protein-RNA complex, we split the 103 protein-RNA complexes into six categories. For each category, we build a gradient boosted regression tree (GBRT) model based on the generated features. We perform a comprehensive evaluation for the proposed method on the binding affinity dataset using leave-one-out cross-validation. We show that PredPRBA achieves correlations ranging from 0.723 to 0.897 among six categories, which is significantly better than other typical regression methods and the pioneer protein-RNA binding affinity predictor SPOT-Seq-RNA. In addition, a user-friendly web server has been developed to predict the binding affinity of protein-RNA complexes. The PredPRBA webserver is freely available at http://PredPRBA.denglab.org/.
Project description:Empirical scoring functions used in protein-ligand docking calculations are typically trained on a dataset of complexes with known affinities with the aim of generalizing across different docking applications. We report a novel method of scoring-function optimization that supports the use of additional information to constrain scoring function parameters, which can be used to focus a scoring function's training towards a particular application, such as screening enrichment. The approach combines multiple instance learning, positive data in the form of ligands of protein binding sites of known and unknown affinity and binding geometry, and negative (decoy) data of ligands thought not to bind particular protein binding sites or known not to bind in particular geometries. Performance of the method for the Surflex-Dock scoring function is shown in cross-validation studies and in eight blind test cases. Tuned functions optimized with a sufficient amount of data exhibited either improved or undiminished screening performance relative to the original function across all eight complexes. Analysis of the changes to the scoring function suggest that modifications can be learned that are related to protein-specific features such as active-site mobility.
Project description:Binding affinity prediction is one of the most critical components to computer-aided structure-based drug design. Despite advances in first-principle methods for predicting binding affinity, empirical scoring functions that are fast and only relatively accurate are still widely used in structure-based drug design. With the increasing availability of X-ray crystallographic structures in the Protein Data Bank and continuing application of biophysical methods such as isothermal titration calorimetry to measure thermodynamic parameters contributing to binding free energy, sufficient experimental data exists that scoring functions can now be derived by separating enthalpic (?H) and entropic (T?S) contributions to binding free energy (?G). PHOENIX, a scoring function to predict binding affinities of protein-ligand complexes, utilizes the increasing availability of experimental data to improve binding affinity predictions by the following: model training and testing using high-resolution crystallographic data to minimize structural noise, independent models of enthalpic and entropic contributions fitted to thermodynamic parameters assumed to be thermodynamically biased to calculate binding free energy, use of shape and volume descriptors to better capture entropic contributions. A set of 42 descriptors and 112 protein-ligand complexes were used to derive functions using partial least-squares for change of enthalpy (?H) and change of entropy (T?S) to calculate change of binding free energy (?G), resulting in a predictive r2 (r(pred)2) of 0.55 and a standard error (SE) of 1.34 kcal/mol. External validation using the 2009 version of the PDBbind "refined set" (n = 1612) resulted in a Pearson correlation coefficient (R(p)) of 0.575 and a mean error (ME) of 1.41 pK(d). Enthalpy and entropy predictions were of limited accuracy individually. However, their difference resulted in a relatively accurate binding free energy. While the development of an accurate and applicable scoring function was an objective of this study, the main focus was evaluation of the use of high-resolution X-ray crystal structures with high-quality thermodynamic parameters from isothermal titration calorimetry for scoring function development. With the increasing application of structure-based methods in molecular design, this study suggests that using high-resolution crystal structures, separating enthalpy and entropy contributions to binding free energy, and including descriptors to better capture entropic contributions may prove to be effective strategies toward rapid and accurate calculation of binding affinity.
Project description:Development and binding affinity predictions of inhibitors targeting protein-protein interactions (PPI) still represent a major challenge in drug discovery efforts. This work reports application of a predictive non-empirical model of inhibitory activity for PPI inhibitors, exemplified here for small molecules targeting the menin-mixed lineage leukemia (MLL) interaction. Systematic ab initio analysis of menin-inhibitor complexes was performed, revealing the physical nature of these interactions. Notably, the non-empirical protein-ligand interaction energy comprising electrostatic multipole and approximate dispersion terms (E(10)El,MTP + EDas) produced a remarkable correlation with experimentally measured inhibitory activities and enabled accurate activity prediction for new menin-MLL inhibitors. Importantly, this relatively simple and computationally affordable non-empirical interaction energy model outperformed binding affinity predictions derived from commonly used empirical scoring functions. This study demonstrates high relevance of the non-empirical model we developed for binding affinity prediction of inhibitors targeting protein-protein interactions that are difficult to predict using empirical scoring functions.
Project description:The effects of solvation and entropy play a critical role in determining the binding free energy in protein-ligand interactions. Despite the good balance between speed and accuracy, no current knowledge-based scoring functions account for the effects of solvation and configurational entropy explicitly due to the difficulty in deriving the corresponding pair potentials and the resulting double counting problem. In the present work, we have included the solvation effect and configurational entropy in the knowledge-based scoring function by an iterative method. The newly developed scoring function has yielded a success rate of 91% in identifying near-native binding modes with Wang et al.'s benchmark of 100 diverse protein-ligand complexes. The results have been compared with the results of 15 other scoring functions for validation purpose. In binding affinity prediction, our scoring function has yielded a correlation of R(2) = 0.76 between the predicted binding scores and the experimentally measured binding affinities on the PMF validation sets of 77 diverse complexes. The results have been compared with R(2) of four other well-known knowledge-based scoring functions. Finally, our scoring function was also validated on the large PDBbind database of 1299 protein-ligand complexes and yielded a correlation coefficient of 0.474. The present computational model can be applied to other scoring functions to account for solvation and entropic effects.
Project description:We describe binding free energy calculations in the D3R Grand Challenge 2015 for blind prediction of the binding affinities of 180 ligands to Hsp90. The present D3R challenge was built around experimental datasets involving Heat shock protein (Hsp) 90, an ATP-dependent molecular chaperone which is an important anticancer drug target. The Hsp90 ATP binding site is known to be a challenging target for accurate calculations of ligand binding affinities because of the ligand-dependent conformational changes in the binding site, the presence of ordered waters and the broad chemical diversity of ligands that can bind at this site. Our primary focus here is to distinguish binders from nonbinders. Large scale absolute binding free energy calculations that cover over 3000 protein-ligand complexes were performed using the BEDAM method starting from docked structures generated by Glide docking. Although the ligand dataset in this study resembles an intermediate to late stage lead optimization project while the BEDAM method is mainly developed for early stage virtual screening of hit molecules, the BEDAM binding free energy scoring has resulted in a moderate enrichment of ligand screening against this challenging drug target. Results show that, using a statistical mechanics based free energy method like BEDAM starting from docked poses offers better enrichment than classical docking scoring functions and rescoring methods like Prime MM-GBSA for the Hsp90 data set in this blind challenge. Importantly, among the three methods tested here, only the mean value of the BEDAM binding free energy scores is able to separate the large group of binders from the small group of nonbinders with a gap of 2.4 kcal/mol. None of the three methods that we have tested provided accurate ranking of the affinities of the 147 active compounds. We discuss the possible sources of errors in the binding free energy calculations. The study suggests that BEDAM can be used strategically to discriminate binders from nonbinders in virtual screening and to more accurately predict the ligand binding modes prior to the more computationally expensive FEP calculations of binding affinity.
Project description:We discuss the effectiveness of existing methods for understanding the forces driving the formation of specific protein-DNA complexes. Theoretical approaches using the Poisson-Boltzmann (PB) equation to analyse interactions between these highly charged macromolecules to form known structures are contrasted with an empirical approach that analyses the effects of salt on the stability of these complexes and assumes that release of counter-ions associated with the free DNA plays the dominant role in their formation. According to this counter-ion condensation (CC) concept, the salt-dependent part of the Gibbs energy of binding, which is defined as the electrostatic component, is fully entropic and its dependence on the salt concentration represents the number of ionic contacts present in the complex. It is shown that although this electrostatic component provides the majority of the Gibbs energy of complex formation and does not depend on the DNA sequence, the salt-independent part of the Gibbs energy--usually regarded as non-electrostatic--is sequence specific. The CC approach thus has considerable practical value for studying protein/DNA complexes, while practical applications of PB analysis have yet to demonstrate their merit.