Project description:Computational (in silico) methods have been developed and widely applied to pharmacology hypothesis development and testing. These in silico methods include databases, quantitative structure-activity relationships, similarity searching, pharmacophores, homology models and other molecular modeling, machine learning, data mining, network analysis tools and data analysis tools that use a computer. Such methods have seen frequent use in the discovery and optimization of novel molecules with affinity to a target, the clarification of absorption, distribution, metabolism, excretion and toxicity properties as well as physicochemical characterization. The first part of this review discussed the methods that have been used for virtual ligand and target-based screening and profiling to predict biological activity. The aim of this second part of the review is to illustrate some of the varied applications of in silico methods for pharmacology in terms of the targets addressed. We will also discuss some of the advantages and disadvantages of in silico methods with respect to in vitro and in vivo methods for pharmacology research. Our conclusion is that the in silico pharmacology paradigm is ongoing and presents a rich array of opportunities that will assist in expediting the discovery of new targets, and ultimately lead to compounds with predicted biological activity for these novel targets.
Project description:The coronavirus disease 19 (COVID-19) is a rapidly growing pandemic caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Its papain-like protease (SARS-CoV-2 PLpro) is a crucial target to halt virus replication. SARS-CoV PLpro and SARS-CoV-2 PLpro share an 82.9% sequence identity and a 100% sequence identity for the binding site reported to accommodate small molecules in SARS-CoV. The flexible key binding site residues Tyr269 and Gln270 for small-molecule recognition in SARS-CoV PLpro exist also in SARS-CoV-2 PLpro. This inspired us to use the reported small-molecule binders to SARS-CoV PLpro to generate a high-quality DEKOIS 2.0 benchmark set. Accordingly, we used them in a cross-benchmarking study against SARS-CoV-2 PLpro. As there is no SARS-CoV-2 PLpro structure complexed with a small-molecule ligand publicly available at the time of manuscript submission, we built a homology model based on the ligand-bound SARS-CoV structure for benchmarking and docking purposes. Three publicly available docking tools FRED, AutoDock Vina, and PLANTS were benchmarked. All showed better-than-random performances, with FRED performing best against the built model. Detailed performance analysis via pROC-Chemotype plots showed a strong enrichment of the most potent bioactives in the early docking ranks. Cross-benchmarking against the X-ray structure complexed with a peptide-like inhibitor confirmed that FRED is the best-performing tool. Furthermore, we performed cross-benchmarking against the newly introduced X-ray structure complexed with a small-molecule ligand. Interestingly, its benchmarking profile and chemotype enrichment were comparable to the built model. Accordingly, we used FRED in a prospective virtual screen of the DrugBank database. In conclusion, this study provides an example of how to harness a custom-made DEKOIS 2.0 benchmark set as an approach to enhance the virtual screening success rate against a vital target of the rapidly emerging pandemic.
Project description:Differential scanning fluorimetry (DSF) is an accessible, rapid, and economical biophysical technique that has seen many applications over the years, ranging from protein folding state detection to the identification of ligands that bind to the target protein. In this review, we discuss the theory, applications, and limitations of DSF, including the latest applications of DSF by ourselves and other researchers. We show that DSF is a powerful high-throughput tool in early drug discovery efforts. We place DSF in the context of other biophysical methods frequently used in drug discovery and highlight their benefits and downsides. We illustrate the uses of DSF in protein buffer optimization for stability, refolding, and crystallization purposes and provide several examples of each. We also show the use of DSF in a more downstream application, where it is used as an in vivo validation tool of ligand-target interaction in cell assays. Although DSF is a potent tool in buffer optimization and large chemical library screens when it comes to ligand-binding validation and optimization, orthogonal techniques are recommended as DSF is prone to false positives and negatives.
Project description:In recent years, machine learning has transformed many aspects of the drug discovery process, including small molecule design, for which the prediction of bioactivity is an integral part. Leveraging structural information about the interactions between a small molecule and its protein target has great potential for downstream machine learning scoring approaches but is fundamentally limited by the accuracy with which protein-ligand complex structures can be predicted in a reliable and automated fashion. With the goal of finding practical approaches to generating useful kinase-inhibitor complex geometries for downstream machine learning scoring approaches, we present a kinase-centric docking benchmark assessing the performance of different classes of docking and pose selection strategies to assess how well experimentally observed binding modes are recapitulated in a realistic cross-docking scenario. The assembled benchmark data set focuses on the well-studied protein kinase family and comprises a subset of 589 protein structures cocrystallized with 423 ATP-competitive ligands. We find that the docking methods biased by the cocrystallized ligand, utilizing shape overlap with or without maximum common substructure matching, are more successful in recovering binding poses than standard physics-based docking alone. Also, docking into multiple structures significantly increases the chance of generating a low root-mean-square deviation (RMSD) docking pose. Docking utilizing an approach that combines all three methods (Posit) into structures with the most similar cocrystallized ligands according to the maximum common substructure (MCS) proved to be the most efficient way to reproduce binding poses, achieving a success rate of 70.4% across all included systems. The studied docking and pose selection strategies, which utilize the OpenEye Toolkits, were implemented into pipelines of the KinoML framework, allowing automated and reliable protein-ligand complex generation for future downstream machine learning tasks. Although focused on protein kinases, we believe that the general findings can also be transferred to other protein families.
Project description:Benchmarking RNA-seq differential expression analysis methods using spike-in and simulated RNA-seq data has often yielded inconsistent results. The spike-in data, which were generated from the same bulk RNA sample, only represent technical variability, making the test results less reliable. We compared the performance of 12 differential expression analysis methods for RNA-seq data, including recent variants in widely used software packages, using both RNA spike-in and simulation data for negative binomial (NB) model. Performance of edgeR, DESeq2, and ROTS was particularly different between the two benchmark tests. Then, each method was tested under most extensive simulation conditions especially demonstrating the large impacts of proportion, dispersion, and balance of differentially expressed (DE) genes. DESeq2, a robust version of edgeR (edgeR.rb), voom with TMM normalization (voom.tmm) and sample weights (voom.sw) showed an overall good performance regardless of presence of outliers and proportion of DE genes. The performance of RNA-seq DE gene analysis methods substantially depended on the benchmark used. Based on the simulation results, suitable methods were suggested under various test conditions.
Project description:The identification of interactions between drugs/compounds and their targets is crucial for the development of new drugs. In vitro screening experiments (i.e. bioassays) are frequently used for this purpose; however, experimental approaches are insufficient to explore novel drug-target interactions, mainly because of feasibility problems, as they are labour intensive, costly and time consuming. A computational field known as 'virtual screening' (VS) has emerged in the past decades to aid experimental drug discovery studies by statistically estimating unknown bio-interactions between compounds and biological targets. These methods use the physico-chemical and structural properties of compounds and/or target proteins along with the experimentally verified bio-interaction information to generate predictive models. Lately, sophisticated machine learning techniques are applied in VS to elevate the predictive performance. The objective of this study is to examine and discuss the recent applications of machine learning techniques in VS, including deep learning, which became highly popular after giving rise to epochal developments in the fields of computer vision and natural language processing. The past 3 years have witnessed an unprecedented amount of research studies considering the application of deep learning in biomedicine, including computational drug discovery. In this review, we first describe the main instruments of VS methods, including compound and protein features (i.e. representations and descriptors), frequently used libraries and toolkits for VS, bioactivity databases and gold-standard data sets for system training and benchmarking. We subsequently review recent VS studies with a strong emphasis on deep learning applications. Finally, we discuss the present state of the field, including the current challenges and suggest future directions. We believe that this survey will provide insight to the researchers working in the field of computational drug discovery in terms of comprehending and developing novel bio-prediction methods.
Project description:While deep learning has revolutionized computer-aided drug discovery, the AI community has predominantly focused on model innovation and placed less emphasis on establishing best benchmarking practices. We posit that without a sound model evaluation framework, the AI community's efforts cannot reach their full potential, thereby slowing the progress and transfer of innovation into real-world drug discovery. Thus, in this paper, we seek to establish a new gold standard for small molecule drug discovery benchmarking, WelQrate. Specifically, our contributions are threefold: WelQrate Dataset Collection - we introduce a meticulously curated collection of 9 datasets spanning 5 therapeutic target classes. Our hierarchical curation pipelines, designed by drug discovery experts, go beyond the primary high-throughput screen by leveraging additional confirmatory and counter screens along with rigorous domain-driven preprocessing, such as Pan-Assay Interference Compounds (PAINS) filtering, to ensure the high-quality data in the datasets; WelQrate Evaluation Framework - we propose a standardized model evaluation framework considering high-quality datasets, featurization, 3D conformation generation, evaluation metrics, and data splits, which provides a reliable benchmarking for drug discovery experts conducting real-world virtual screening; Benchmarking - we evaluate model performance through various research questions using the WelQrate dataset collection, exploring the effects of different models, dataset quality, featurization methods, and data splitting strategies on the results. In summary, we recommend adopting our proposed WelQrate as the gold standard in small molecule drug discovery benchmarking. The WelQrate dataset collection, along with the curation codes, and experimental scripts are all publicly available at WelQrate.org.
Project description:Tuberculosis (TB) remains a serious threat to global public health, responsible for an estimated 1.5 million mortalities in 2018. While there are available therapeutics for this infection, slow-acting drugs, poor patient compliance, drug toxicity, and drug resistance require the discovery of novel TB drugs. Discovering new and more potent antibiotics that target novel TB protein targets is an attractive strategy towards controlling the global TB epidemic. In silico strategies can be applied at multiple stages of the drug discovery paradigm to expedite the identification of novel anti-TB therapeutics. In this paper, we discuss the current TB treatment, emergence of drug resistance, and the effective application of computational tools to the different stages of TB drug discovery when combined with traditional biochemical methods. We will also highlight the strengths and points of improvement in in silico TB drug discovery research, as well as possible future perspectives in this field.
Project description:Pharmacology over the past 100 years has had a rich tradition of scientists with the ability to form qualitative or semi-quantitative relations between molecular structure and activity in cerebro. To test these hypotheses they have consistently used traditional pharmacology tools such as in vivo and in vitro models. Increasingly over the last decade however we have seen that computational (in silico) methods have been developed and applied to pharmacology hypothesis development and testing. These in silico methods include databases, quantitative structure-activity relationships, pharmacophores, homology models and other molecular modeling approaches, machine learning, data mining, network analysis tools and data analysis tools that use a computer. In silico methods are primarily used alongside the generation of in vitro data both to create the model and to test it. Such models have seen frequent use in the discovery and optimization of novel molecules with affinity to a target, the clarification of absorption, distribution, metabolism, excretion and toxicity properties as well as physicochemical characterization. The aim of this review is to illustrate some of the in silico methods for pharmacology that are used in drug discovery. Further applications of these methods to specific targets and their limitations will be discussed in the second accompanying part of this review.
Project description:Cancer is a complex disease that relies on both oncogenic mutations and non-mutated genes for survival, and therefore coined as oncogene and non-oncogene addictions. The need for more effective combination therapies to overcome drug resistance in oncology has been increasingly recognized, but the identification of potentially synergistic drugs at scale remains challenging. Here we propose a gene-expression-based approach, which uses the recurrent perturbation-transcript regulatory relationships inferred from a large compendium of chemical and genetic perturbation experiments across multiple cell lines, to engender a testable hypothesis for combination therapies. These transcript-level recurrences were distinct from known compound-protein target counterparts, were reproducible in external datasets, and correlated with small-molecule sensitivity. We applied these recurrent relationships to predict synergistic drug pairs for cancer and experimentally confirmed two unexpected drug combinations in vitro. Our results corroborate a gene-expression-based strategy for combinatorial drug screening as a way to target non-mutated genes in complex diseases.