Project description:The coronavirus disease 19 (COVID-19) is a rapidly growing pandemic caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Its papain-like protease (SARS-CoV-2 PLpro) is a crucial target to halt virus replication. SARS-CoV PLpro and SARS-CoV-2 PLpro share an 82.9% sequence identity and a 100% sequence identity for the binding site reported to accommodate small molecules in SARS-CoV. The flexible key binding site residues Tyr269 and Gln270 for small-molecule recognition in SARS-CoV PLpro exist also in SARS-CoV-2 PLpro. This inspired us to use the reported small-molecule binders to SARS-CoV PLpro to generate a high-quality DEKOIS 2.0 benchmark set. Accordingly, we used them in a cross-benchmarking study against SARS-CoV-2 PLpro. As there is no SARS-CoV-2 PLpro structure complexed with a small-molecule ligand publicly available at the time of manuscript submission, we built a homology model based on the ligand-bound SARS-CoV structure for benchmarking and docking purposes. Three publicly available docking tools FRED, AutoDock Vina, and PLANTS were benchmarked. All showed better-than-random performances, with FRED performing best against the built model. Detailed performance analysis via pROC-Chemotype plots showed a strong enrichment of the most potent bioactives in the early docking ranks. Cross-benchmarking against the X-ray structure complexed with a peptide-like inhibitor confirmed that FRED is the best-performing tool. Furthermore, we performed cross-benchmarking against the newly introduced X-ray structure complexed with a small-molecule ligand. Interestingly, its benchmarking profile and chemotype enrichment were comparable to the built model. Accordingly, we used FRED in a prospective virtual screen of the DrugBank database. In conclusion, this study provides an example of how to harness a custom-made DEKOIS 2.0 benchmark set as an approach to enhance the virtual screening success rate against a vital target of the rapidly emerging pandemic.

Project description:In recent years, machine learning has transformed many aspects of the drug discovery process, including small molecule design, for which the prediction of bioactivity is an integral part. Leveraging structural information about the interactions between a small molecule and its protein target has great potential for downstream machine learning scoring approaches but is fundamentally limited by the accuracy with which protein-ligand complex structures can be predicted in a reliable and automated fashion. With the goal of finding practical approaches to generating useful kinase-inhibitor complex geometries for downstream machine learning scoring approaches, we present a kinase-centric docking benchmark assessing the performance of different classes of docking and pose selection strategies to assess how well experimentally observed binding modes are recapitulated in a realistic cross-docking scenario. The assembled benchmark data set focuses on the well-studied protein kinase family and comprises a subset of 589 protein structures cocrystallized with 423 ATP-competitive ligands. We find that the docking methods biased by the cocrystallized ligand, utilizing shape overlap with or without maximum common substructure matching, are more successful in recovering binding poses than standard physics-based docking alone. Also, docking into multiple structures significantly increases the chance of generating a low root-mean-square deviation (RMSD) docking pose. Docking utilizing an approach that combines all three methods (Posit) into structures with the most similar cocrystallized ligands according to the maximum common substructure (MCS) proved to be the most efficient way to reproduce binding poses, achieving a success rate of 70.4% across all included systems. The studied docking and pose selection strategies, which utilize the OpenEye Toolkits, were implemented into pipelines of the KinoML framework, allowing automated and reliable protein-ligand complex generation for future downstream machine learning tasks. Although focused on protein kinases, we believe that the general findings can also be transferred to other protein families.

Project description:The identification of interactions between drugs/compounds and their targets is crucial for the development of new drugs. In vitro screening experiments (i.e. bioassays) are frequently used for this purpose; however, experimental approaches are insufficient to explore novel drug-target interactions, mainly because of feasibility problems, as they are labour intensive, costly and time consuming. A computational field known as 'virtual screening' (VS) has emerged in the past decades to aid experimental drug discovery studies by statistically estimating unknown bio-interactions between compounds and biological targets. These methods use the physico-chemical and structural properties of compounds and/or target proteins along with the experimentally verified bio-interaction information to generate predictive models. Lately, sophisticated machine learning techniques are applied in VS to elevate the predictive performance. The objective of this study is to examine and discuss the recent applications of machine learning techniques in VS, including deep learning, which became highly popular after giving rise to epochal developments in the fields of computer vision and natural language processing. The past 3 years have witnessed an unprecedented amount of research studies considering the application of deep learning in biomedicine, including computational drug discovery. In this review, we first describe the main instruments of VS methods, including compound and protein features (i.e. representations and descriptors), frequently used libraries and toolkits for VS, bioactivity databases and gold-standard data sets for system training and benchmarking. We subsequently review recent VS studies with a strong emphasis on deep learning applications. Finally, we discuss the present state of the field, including the current challenges and suggest future directions. We believe that this survey will provide insight to the researchers working in the field of computational drug discovery in terms of comprehending and developing novel bio-prediction methods.

Dataset Information

Optimizing in silico drug discovery: simulation of connected differential expression signatures and applications to benchmarking

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets