Virtual screening approach to identifying influenza virus neuraminidase inhibitors using molecular docking combined with machine-learning-based scoring function.
ABSTRACT: In recent years, an epidemic of the highly pathogenic avian influenza H7N9 virus has persisted in China, with a high mortality rate. To develop novel anti-influenza therapies, we have constructed a machine-learning-based scoring function (RF-NA-Score) for the effective virtual screening of lead compounds targeting the viral neuraminidase (NA) protein. RF-NA-Score is more accurate than RF-Score, with a root-mean-square error of 1.46, Pearson's correlation coefficient of 0.707, and Spearman's rank correlation coefficient of 0.707 in a 5-fold cross-validation study. The performance of RF-NA-Score in a docking-based virtual screening of NA inhibitors was evaluated with a dataset containing 281 NA inhibitors and 322 noninhibitors. Compared with other docking-rescoring virtual screening strategies, rescoring with RF-NA-Score significantly improved the efficiency of virtual screening, and a strategy that averaged the scores given by RF-NA-Score, based on the binding conformations predicted with AutoDock, AutoDock Vina, and LeDock, was shown to be the best strategy. This strategy was then applied to the virtual screening of NA inhibitors in the SPECS database. The 100 selected compounds were tested in an in vitro H7N9 NA inhibition assay, and two compounds with novel scaffolds showed moderate inhibitory activities. These results indicate that RF-NA-Score improves the efficiency of virtual screening for NA inhibitors, and can be used successfully to identify new NA inhibitor scaffolds. Scoring functions specific for other drug targets could also be established with the same method.
Project description:Rescoring is a simple approach that theoretically could improve the original docking results. In this study AutoDock Vina was used as a docked engine and three other scoring functions besides the original scoring function, Vina, as well as their combinations as consensus scoring functions were employed to explore the effect of rescoring on virtual screenings that had been done on diverse targets. Rescoring by DrugScore produces the most number of cases with significant changes in screening power. Thus, the DrugScore results were used to build a simple model based on two binding site descriptors that could predict possible improvement by DrugScore rescoring. Furthermore, generally the screening power of all rescoring approach as well as original AutoDock Vina docking results correlated with the Maximum Theoretical Shape Complementarity (MTSC) and Maximum Distance from Center of Mass and all Alpha spheres (MDCMA). Therefore, it was suggested that, with a more complete set of binding site descriptors, it could be possible to find robust relationship between binding site descriptors and response to certain molecular docking programs and scoring functions. The results could be helpful for future researches aiming to do a virtual screening using AutoDock Vina and/or rescoring using DrugScore.
Project description:The failure of default scoring functions to ensure virtual screening enrichment is a persistent problem for the molecular docking algorithms used in structure-based drug discovery. To remedy this problem, elaborate rescoring and postprocessing schemes have been developed with a varying degree of success, specificity, and cost. The negative image-based rescoring (R-NiB) has been shown to improve the flexible docking performance markedly with a variety of drug targets. The yield improvement is achieved by comparing the alternative docking poses against the negative image of the target protein's ligand-binding cavity. In other words, the shape and electrostatics of the binding pocket is directly used in the similarity comparison to rank the explicit docking poses. Here, the PANTHER/ShaEP-based R-NiB methodology is tested with six popular docking softwares, including GLIDE, PLANTS, GOLD, DOCK, AUTODOCK, and AUTODOCK VINA, using five validated benchmark sets. Overall, the results indicate that R-NiB outperforms the default docking scoring consistently and inexpensively, demonstrating that the methodology is ready for wide-scale virtual screening usage.
Project description:BACKGROUND:Small-molecule docking is an important tool in studying receptor-ligand interactions and in identifying potential drug candidates. Previously, we developed a software tool (DOVIS) to perform large-scale virtual screening of small molecules in parallel on Linux clusters, using AutoDock 3.05 as the docking engine. DOVIS enables the seamless screening of millions of compounds on high-performance computing platforms. In this paper, we report significant advances in the software implementation of DOVIS 2.0, including enhanced screening capability, improved file system efficiency, and extended usability. IMPLEMENTATION:To keep DOVIS up-to-date, we upgraded the software's docking engine to the more accurate AutoDock 4.0 code. We developed a new parallelization scheme to improve runtime efficiency and modified the AutoDock code to reduce excessive file operations during large-scale virtual screening jobs. We also implemented an algorithm to output docked ligands in an industry standard format, sd-file format, which can be easily interfaced with other modeling programs. Finally, we constructed a wrapper-script interface to enable automatic rescoring of docked ligands by arbitrarily selected third-party scoring programs. CONCLUSION:The significance of the new DOVIS 2.0 software compared with the previous version lies in its improved performance and usability. The new version makes the computation highly efficient by automating load balancing, significantly reducing excessive file operations by more than 95%, providing outputs that conform to industry standard sd-file format, and providing a general wrapper-script interface for rescoring of docked ligands. The new DOVIS 2.0 package is freely available to the public under the GNU General Public License.
Project description:Protein-ligand docking is a key computational method in the design of starting points for the drug discovery process. We are motivated by the desire to automate large-scale docking using our popular docking engine idock and thus have developed a publicly-accessible web platform called istar. Without tedious software installation, users can submit jobs using our website. Our istar website supports 1) filtering ligands by desired molecular properties and previewing the number of ligands to dock, 2) monitoring job progress in real time, and 3) visualizing ligand conformations and outputting free energy and ligand efficiency predicted by idock, binding affinity predicted by RF-Score, putative hydrogen bonds, and supplier information for easy purchase, three useful features commonly lacked on other online docking platforms like DOCK Blaster or iScreen. We have collected 17,224,424 ligands from the All Clean subset of the ZINC database, and revamped our docking engine idock to version 2.0, further improving docking speed and accuracy, and integrating RF-Score as an alternative rescoring function. To compare idock 2.0 with the state-of-the-art AutoDock Vina 1.1.2, we have carried out a rescoring benchmark and a redocking benchmark on the 2,897 and 343 protein-ligand complexes of PDBbind v2012 refined set and CSAR NRC HiQ Set 24Sept2010 respectively, and an execution time benchmark on 12 diverse proteins and 3,000 ligands of different molecular weight. Results show that, under various scenarios, idock achieves comparable success rates while outperforming AutoDock Vina in terms of docking speed by at least 8.69 times and at most 37.51 times. When evaluated on the PDBbind v2012 core set, our istar platform combining with RF-Score manages to reproduce Pearson's correlation coefficient and Spearman's correlation coefficient of as high as 0.855 and 0.859 respectively between the experimental binding affinity and the predicted binding affinity of the docked conformation. istar is freely available at http://istar.cse.cuhk.edu.hk/idock.
Project description:IMPORTANCE TO THE FIELD: Virtual screening is a computer-based technique for identifying promising compounds to bind to a target molecule of known structure. Given the rapidly increasing number of protein and nucleic acid structures, virtual screening continues to grow as an effective method for the discovery of new inhibitors and drug molecules. AREAS COVERED IN THIS REVIEW: We describe virtual screening methods that are available in the AutoDock suite of programs, and several of our successes in using AutoDock virtual screening in pharmaceutical lead discovery. WHAT THE READER WILL GAIN: A general overview of the challenges of virtual screening is presented, along with the tools available in the AutoDock suite of programs for addressing these challenges. TAKE HOME MESSAGE: Virtual screening is an effective tool for the discovery of compounds for use as leads in drug discovery, and the free, open source program AutoDock is an effective tool for virtual screening.
Project description:In this work, random forest (RF), support vector machine, k-nearest neighbor and C4.5 decision tree, were used to establish classification models for predicting whether an unknown molecule is an inhibitor of human topoisomerase I (Top1) protein. All these models have achieved satisfactory results, with total prediction accuracies from 89.70% to 97.12%. Through comparative analysis, it can be found that the RF model has the best forecasting effect. The parameters were further optimized to generate the best-performing RF model. At the same time, features selection was implemented to choose properties most relevant to the inhibition of Top1 from 189 molecular descriptors through a special RF procedure. Subsequently, a ligand-based virtual screening was performed from the Maybridge database by the optimal RF model and 596 hits were picked out. Then, 67 molecules with relative probability scores over 0.7 were selected based on the screening results. Next, the 67 molecules above were docked to Top1 using AutoDock Vina. Finally, six top-ranked molecules with binding energies less than -10.0 kcal/mol were screened out and a common backbone, which is entirely different from that of existing Top1 inhibitors reported in the literature, was found.
Project description:Matrix metalloproteinases (MMPs) have distinctive roles in various physiological and pathological processes such as inflammatory diseases and cancer. This study explored the performance of eleven scoring functions (D-Score, G-Score, ChemScore, F-Score, PMF-Score, PoseScore, RankScore, DSX, and X-Score and scoring functions of AutoDock4.1 and AutoDockVina). Their performance was judged by calculation of their correlations to experimental binding affinities of 3D ligand-enzyme complexes of MMP family. Furthermore, they were evaluated for their ability in reranking virtual screening study results performed on a member of MMP family (MMP-12). Enrichment factor at different levels and receiver operating characteristics (ROC) curves were used to assess their performance. Finally, we have developed a PCA model from the best functions. Of the scoring functions evaluated, F-Score, DSX, and ChemScore were the best overall performers in prediction of MMPs-inhibitors binding affinities while ChemScore, Autodock, and DSX had the best discriminative power in virtual screening against the MMP-12 target. Consensus scorings did not show statistically significant superiority over the other scorings methods in correlation study while PCA model which consists of ChemScore, Autodock, and DSX improved overall enrichment. Outcome of this study could be useful for the setting up of a suitable scoring protocol, resulting in enrichment of MMPs inhibitors.
Project description:In this study we introduce a rescoring method to improve the accuracy of docking programs against mPGES-1. The rescoring method developed is a result of extensive computational study in which different scoring functions and molecular descriptors were combined to develop consensus and rescoring methods. 127 mPGES-1 inhibitors were collected from literature and were segregated into training and external test sets. Docking of the 27 training set compounds was carried out using default settings in AutoDock Vina, AutoDock, DOCK6 and GOLD programs. The programs showed low to moderate correlation with the experimental activities. In order to introduce the contributions of desolvation penalty and conformation energy of the inhibitors various molecular descriptors were calculated. Later, rescoring method was developed as empirical sum of normalised values of docking scores, LogP and Nrotb. The results clearly indicated that LogP and Nrotb recuperate the predictions of these docking programs. Further the efficiency of the rescoring method was validated using 100 test set compounds. The accurate prediction of binding affinities for analogues of the same compounds is a major challenge for many of the existing docking programs; in the present study the high correlation obtained for experimental and predicted pIC50 values for the test set compounds validates the efficiency of the scoring method.
Project description:Nwat-MMGBSA is a variant of MM-PB/GBSA based on the inclusion of a number of explicit water molecules that are the closest to the ligand in each frame of a molecular dynamics trajectory. This method demonstrated improved correlations between calculated and experimental binding energies in both protein-protein interactions and ligand-receptor complexes, in comparison to the standard MM-GBSA. A protocol optimization, aimed to maximize efficacy and efficiency, is discussed here considering penicillopepsin, HIV1-protease, and BCL-XL as test cases. Calculations were performed in triplicates on both classic HPC environments and on standard workstations equipped by a GPU card, evidencing no statistical differences in the results. No relevant differences in correlation to experiments were also observed when performing Nwat-MMGBSA calculations on 4 or 1 ns long trajectories. A fully automatic workflow for structure-based virtual screening, performing from library set-up to docking and Nwat-MMGBSA rescoring, has then been developed. The protocol has been tested against no rescoring or standard MM-GBSA rescoring within a retrospective virtual screening of inhibitors of AmpC ?-lactamase and of the Rac1-Tiam1 protein-protein interaction. In both cases, Nwat-MMGBSA rescoring provided a statistically significant increase in the ROC AUCs of between 20 and 30%, compared to docking scoring or to standard MM-GBSA rescoring.
Project description:blaVEB-1 is an integron-located extended-spectrum ?-lactamase gene initially detected in Escherichia coli and Pseudomonas aeruginosa strains from south-east Asia. Several recent studies have reported that VEB-1-positive strains are highly resistant to ceftazidime, cefotaxime and aztreonam antibiotics. One strategy to overcome resistance involves administering antibiotics together with ?-lactamase inhibitors during the treatment of infectious diseases. During this study, four VEB-1 ?-lactamase inhibitors were identified using computer-aided drug design.The SWISS-MODEL tool was utilized to generate three dimensional structures of VEB-1 ?-lactamase, and the 3D model VEB-1 was verified using PROCHECK, ERRAT and VERIFY 3D programs. Virtual screening was performed by docking inhibitors obtained from the ZINC Database to the active site of the VEB-1 protein using AutoDock Vina software.Homology modeling studies were performed to obtain a three-dimensional structure of VEB-1 ?-lactamase. The generated model was validated, and virtual screening of a large chemical ligand library with docking simulations was performed using AutoDock software with the ZINC database. On the basis of the dock-score, four molecules were subjected to ADME/TOX analysis, with ZINC4085364 emerging as the most potent inhibitor of the VEB-1 ?-lactamase.