Project description:We report on a combined atomistic molecular dynamics simulation and implicit solvent analysis of a generic hydrophobic pocket-ligand (host-guest) system. The approaching ligand induces complex wetting-dewetting transitions in the weakly solvated pocket. The transitions lead to bimodal solvent fluctuations which govern magnitude and range of the pocket-ligand attraction. A recently developed implicit water model, based on the minimization of a geometric functional, captures the sensitive aqueous interface response to the concave-convex pocket-ligand configuration semiquantitatively.
Project description:Designing small-molecule-binding proteins, such as enzymes and biosensors, is crucial in protein biology and bioengineering. Generating high-fidelity protein pockets-areas where proteins interact with ligand molecules-is challenging due to complex interactions between ligand molecules and proteins, flexibility of ligand molecules and amino acid side chains, and complex sequence-structure dependencies. Here, we introduce PocketGen, a deep generative method for generating the residue sequence and the full-atom structure within the protein pocket region that leverages sequence-structure consistency. PocketGen consists of a bilevel graph transformer for structural encoding and a sequence refinement module that uses a protein language model (pLM) for sequence prediction. The bilevel graph transformer captures interactions at multiple granularities (atom-level and residue/ligand-level) and aspects (intra-protein and protein-ligand) with bilevel attention mechanisms. For sequence refinement, a structural adapter using cross-attention is integrated into a pLM to ensure structure-sequence consistency. During training, only the adapter is fine-tuned, while the other layers of the pLM remain unchanged. Experiments show that PocketGen can efficiently generate protein pockets with higher binding affinity and validity than state-of-the-art methods. PocketGen is ten times faster than physics-based methods and achieves a 95% success rate (percentage of generated pockets with higher binding affinity than reference pockets) with over 64% amino acid recovery rate.
Project description:Biological processes often depend on protein-ligand binding events, yet accurate calculation of the associated energetics remains as a significant challenge of central importance to structure-based drug design. Recently, we have proposed that the displacement of unfavorable waters by the ligand, replacing them with groups complementary to the protein surface, is the principal driving force for protein-ligand binding, and we have introduced the WaterMap method to account this effect. However, in spite of the adage "nature abhors vacuum," one can occasionally observe situations in which a portion of the receptor active site is so unfavorable for water molecules that a void is formed there. In this paper, we demonstrate that the presence of dry regions in the receptor has a nontrivial effect on ligand binding affinity, and suggest that such regions may represent a general motif for molecular recognition between the dry region in the receptor and the hydrophobic groups in the ligands. With the introduction of a term attributable to the occupation of the dry regions by ligand atoms, combined with the WaterMap calculation, we obtain excellent agreement with experiment for the prediction of relative binding affinities for a number of congeneric ligand series binding to the major urinary protein receptor. In addition, WaterMap when combined with the cavity contribution is more predictive than at least one specific implementation [Abel R, Young T, Farid R, Berne BJ, Friesner RA (2008) J Am Chem Soc 130:2817-2831] of the popular MM-GBSA approach to binding affinity calculation.
Project description:Mutations that arise in HIV-1 protease after exposure to various HIV-1 protease inhibitors have proved to be a difficult aspect in the treatment of HIV. Mutations in the binding pocket of the protease can prevent the protease inhibitor from binding to the protein effectively. In the present study, the crystal structures of 68 HIV-1 proteases complexed with one of the nine FDA approved protease inhibitors from the Protein Data Bank (PDB) were analyzed by (a) identifying the mutational changes with the aid of a developed mutation map and (b) correlating the structure of the binding pockets with the complexed inhibitors. The mutations of each crystal structure were identified by comparing the amino acid sequence of each structure against the HIV-1 wild-type strain HXB2. These mutations were visually presented in the form of a mutation map to analyze mutation patterns corresponding to each protease inhibitor. The crystal structure mutation patterns of each inhibitor (in vitro) were compared against the mutation patterns observed in in vivo data. The in vitro mutation patterns were found to be representative of most of the major in vivo mutations. We then performed a data mining analysis of the binding pockets from each crystal structure in terms of their chemical descriptors to identify important structural features of the HIV-1 protease protein with respect to the binding conformation of the HIV-1 protease inhibitors. Data mining analysis is performed using several classification techniques: Random Forest (RF), linear discriminant analysis (LDA), and logistic regression (LR). We developed two hybrid models, RF-LDA and RF-LR. Random Forest is used as a feature selection proxy, reducing the descriptor space to a few of the most relevant descriptors determined by the classifier. These descriptors are then used to develop the subsequent LDA, LR, and hierarchical classification models. Clustering analysis of the binding pockets using the selected descriptors used to produce the optimal classification models reveals conformational similarities of the ligands in each cluster. This study provides important information in understanding the structural features of HIV-1 protease which cannot be studied by other existing in vivo genomic data sets.
Project description:Many research groups and institutions have created a variety of databases curating experimental and predicted data related to protein-ligand binding. The landscape of available databases is dynamic, with new databases emerging and established databases becoming defunct. Here, we review the current state of databases that contain binding pockets and protein-ligand binding interactions. We have compiled a list of such databases, fifty-three of which are currently available for use. We discuss variation in how binding pockets are defined and summarize pocket-finding methods. We organize the fifty-three databases into subgroups based on goals and contents, and describe standard use cases. We also illustrate that pockets within the same protein are characterized differently across different databases. Finally, we assess critical issues of sustainability, accessibility and redundancy.
Project description:MotivationWith the rapid increase of the structural data of biomolecular complexes, novel structural analysis methods have to be devised with high-throughput capacity to handle immense data input and to construct massive networks at the minimal computational cost. Moreover, novel methods should be capable of handling a broad range of molecular structural sizes and chemical natures, cognisant of the conformational and electrostatic bases of molecular recognition, and sufficiently accurate to enable contextually relevant biological inferences.ResultsA novel molecular topology comparison method was developed and tested. The method was tested for both ligand and binding pocket similarity analyses and a PDB-wide ligand topological similarity map was computed.ConclusionThe unprecedentedly wide scope of ligand definition and large-scale topological similarity mapping can provide very robust tools, of performance unmatched by the present alignment-based methods. The method remarkably shows potential for application for scaffold hopping purposes. It also opens new frontiers in the areas of ligand-mediated protein connectivity, ligand-based molecular phylogeny, target fishing, and off-target predictions. Graphical abstract:A novel molecular topology comparison method based on a combined shape distribution and charge binning scheme is presented.
Project description:The TRAnsient Pockets in Proteins (TRAPP) webserver provides an automated workflow that allows users to explore the dynamics of a protein binding site and to detect pockets or sub-pockets that may transiently open due to protein internal motion. These transient or cryptic sub-pockets may be of interest in the design and optimization of small molecular inhibitors for a protein target of interest. The TRAPP workflow consists of the following three modules: (i) TRAPP structure- generation of an ensemble of structures using one or more of four possible molecular simulation methods; (ii) TRAPP analysis-superposition and clustering of the binding site conformations either in an ensemble of structures generated in step (i) or in PDB structures or trajectories uploaded by the user; and (iii) TRAPP pocket-detection, analysis, and visualization of the binding pocket dynamics and characteristics, such as volume, solvent-exposed area or properties of surrounding residues. A standard sequence conservation score per residue or a differential score per residue, for comparing on- and off-targets, can be calculated and displayed on the binding pocket for an uploaded multiple sequence alignment file, and known protein sequence annotations can be displayed simultaneously. The TRAPP webserver is freely available at http://trapp.h-its.org.
Project description:While a myriad non-coding RNAs are known to be essential in cellular processes and misregulated in diseases, the development of RNA-targeted small molecule probes has met with limited success. To elucidate the guiding principles for selective small molecule/RNA recognition, we analyzed cheminformatic and shape-based descriptors for 104 RNA-targeted ligands with demonstrated biological activity (RNA-targeted BIoactive ligaNd Database, R-BIND). We then compared R-BIND to both FDA-approved small molecule drugs and RNA ligands without reported bioactivity. Several striking trends emerged for bioactive RNA ligands, including: 1) Compliance to medicinal chemistry rules, 2) distinctive structural features, and 3) enrichment in rod-like shapes over others. This work provides unique insights that directly facilitate the selection and synthesis of RNA-targeted libraries with the goal of efficiently identifying selective small molecule ligands for therapeutically relevant RNAs.
Project description:Recent successes in developing small molecule degraders that act through the ubiquitin system have spurred efforts to extend this technology to other mechanisms, including the autophagosomal-lysosomal pathway. Therefore, reports of autophagosome tethering compounds (ATTECs) have received considerable attention from the drug development community. ATTECs are based on the recruitment of targets to LC3/GABARAP, a family of ubiquitin-like proteins that presumably bind to the autophagosome membrane and tether cargo-loaded autophagy receptors into the autophagosome. In this work, we rigorously tested the target engagement of the reported ATTECs to validate the existing LC3/GABARAP ligands. Surprisingly, we were unable to detect interaction with their designated target LC3 using a diversity of biophysical methods. Intrigued by the idea of developing ATTECs, we evaluated the ligandability of LC3/GABARAP by in silico docking and large-scale crystallographic fragment screening. Data based on approximately 1000 crystal structures revealed that most fragments bound to the HP2 but not to the HP1 pocket within the LIR docking site, suggesting a favorable ligandability of HP2. Through this study, we identified diverse validated LC3/GABARAP ligands and fragments as starting points for chemical probe and ATTEC development.
Project description:UnlabelledGeometrical analysis of protein tertiary substructures has been an effective approach employed to predict protein binding sites. This article presents the Protemot web server that carries out prediction of protein binding sites based on the structural templates automatically extracted from the crystal structures of protein-ligand complexes in the PDB (Protein Data Bank). The automatic extraction mechanism is essential for creating and maintaining a comprehensive template library that timely accommodates to the new release of PDB as the number of entries continues to grow rapidly. The design of Protemot is also distinctive by the mechanism employed to expedite the analysis process that matches the tertiary substructures on the contour of the query protein with the templates in the library. This expediting mechanism is essential for providing reasonable response time to the user as the number of entries in the template library continues to grow rapidly due to rapid growth of the number of entries in PDB. This article also reports the experiments conducted to evaluate the prediction power delivered by the Protemot web server. Experimental results show that Protemot can deliver a superior prediction power than a web server based on a manually curated template library with insufficient quantity of entries.Availabilityhttp://protemot.csie.ntu.edu.tw/step1.cgi http://bioinfo.mc.ntu.edu.tw/protemot/step1.cgi.