Deep scaffold hopping with multimodal transformer neural networks
ABSTRACT: Scaffold hopping is a central task of modern medicinal chemistry for rational drug design, which aims to design molecules of novel scaffolds sharing similar target biological activities toward known hit molecules. Traditionally, scaffolding hopping depends on searching databases of available compounds that can't exploit vast chemical space. In this study, we have re-formulated this task as a supervised molecule-to-molecule translation to generate hopped molecules novel in 2D structure but similar in 3D structure, as inspired by the fact that candidate compounds bind with their targets through 3D conformations. To efficiently train the model, we curated over 50 thousand pairs of molecules with increased bioactivity, similar 3D structure, but different 2D structure from public bioactivity database, which spanned 40 kinases commonly investigated by medicinal chemists. Moreover, we have designed a multimodal molecular transformer architecture by integrating molecular 3D conformer through a spatial graph neural network and protein sequence information through Transformer. The trained DeepHop model was shown able to generate around 70% molecules having improved bioactivity together with high 3D similarity but low 2D scaffold similarity to the template molecules. This ratio was 1.9 times higher than other state-of-the-art deep learning methods and rule- and virtual screening-based methods. Furthermore, we demonstrated that the model could generalize to new target proteins through fine-tuning with a small set of active compounds. Case studies have also shown the advantages and usefulness of DeepHop in practical scaffold hopping scenarios.
The online version contains supplementary material available at 10.1186/s13321-021-00565-5.
Project description:Target identification remains a major challenge for modern drug discovery programs aimed at understanding the molecular mechanisms of drugs. Computational target prediction approaches like 2D chemical similarity searches have been widely used but are limited to structures sharing high chemical similarity. Here, we present a new computational approach called chemical similarity network analysis pull-down 3D (CSNAP3D) that combines 3D chemical similarity metrics and network algorithms for structure-based drug target profiling, ligand deorphanization, and automated identification of scaffold hopping compounds. In conjunction with 2D chemical similarity fingerprints, CSNAP3D achieved a >95% success rate in correctly predicting the drug targets of 206 known drugs. Significant improvement in target prediction was observed for HIV reverse transcriptase (HIVRT) compounds, which consist of diverse scaffold hopping compounds targeting the nucleotidyltransferase binding site. CSNAP3D was further applied to a set of antimitotic compounds identified in a cell-based chemical screen and identified novel small molecules that share a pharmacophore with Taxol and display a Taxol-like mechanism of action, which were validated experimentally using in vitro microtubule polymerization assays and cell-based assays.
Project description:Molecular descriptor (2D) and three dimensional (3D) shape based similarity methods are widely used in ligand based virtual drug design. In the present study pairwise structure comparisons among a set of 4858 DTP compounds tested in the NCI60 tumor cell line anticancer drug screen were computed using chemical hashed fingerprints and 3D molecule shapes to calculate 2D and 3D similarities, respectively. Additionally, pairwise biological activity similarities were calculated by correlating the 60 element vectors of pGI50 values corresponding to the cytotoxicity of the compounds across the NCI60 panel. Subsequently, we compared the power of 2D and 3D structural similarity metrics to predict the toxicity pattern of compounds. We found that while the positive predictive value and sensitivity of 3D and molecular descriptor based approaches to predict biological activity are similar, a subset of molecule pairs yielded contradictory results. By simultaneously requiring similarity of biological activities and 3D shapes, and dissimilarity of molecular descriptor based comparisons, we identify pairs of scaffold hopping candidates displaying characteristic core structural changes such as heteroatom/heterocycle change and ring closure. Attempts to discover scaffold hopping candidates of mitoxantrone recovered known Topoisomerase II (Top2) inhibitors, and also predicted new, previously unknown chemotypes possessing in vitro Top2 inhibitory activity.
Project description:The wwLigCSRre web server performs ligand-based screening using a 3D molecular similarity engine. Its aim is to provide an online versatile facility to assist the exploration of the chemical similarity of families of compounds, or to propose some scaffold hopping from a query compound. The service allows the user to screen several chemically diversified focused banks, such as Kinase-, CNS-, GPCR-, Ion-channel-, Antibacterial-, Anticancer- and Analgesic-focused libraries. The server also provides the possibility to screen the DrugBank and DSSTOX/Carcinogenic compounds databases. User banks can also been downloaded. The 3D similarity search combines both geometrical (3D) and physicochemical information. Starting from one 3D ligand molecule as query, the screening of such databases can lead to unraveled compound scaffold as hits or help to optimize previously identified hit molecules in a SAR (Structure activity relationship) project. wwLigCSRre can be accessed at http://bioserv.rpbs.univ-paris-diderot.fr/wwLigCSRre.html.
Project description:The data have been obtained from FABP4 inhibitor molecules previously published. The 120 compounds were used to build a 3D-QSAR model. The development of the QSAR model has been undertaken with the use of Forge software using the PM3 optimized structure and the experimental IC50 of each compound. The QSAR model was also employed to predict the activity of 3000 new isosteric derivatives of BMS309403. The isosteric replacement was also validated by the synthesis and the biological screening of three new compounds reported in the related research article "3D-QSAR assisted identification of FABP4 inhibitors: An effective scaffold hopping analysis/QSAR evaluation" (Floresta et al., 2019).
Project description:Molecular target identification is of central importance to drug discovery. Here, we developed a computational approach, named bioactivity profile similarity search (BASS), for associating targets to small molecules by using the known target annotations of related compounds from public databases. To evaluate BASS, a bioactivity profile database was constructed using 4296 compounds that were commonly tested in the US National Cancer Institute 60 human tumor cell line anticancer drug screen (NCI-60). Each compound was used as a query to search against the entire bioactivity profile database, and reference compounds with similar bioactivity profiles above a threshold of 0.75 were considered as neighbor compounds of the query. Potential targets were subsequently linked to the identified neighbor compounds by using the known targets of the query compound. About 45% of the predicted compound-target associations were successfully verified retrospectively, suggesting the possible application of BASS in identifying the targets of uncharacterized compounds and thus providing insight into the study of promiscuity and polypharmacology. Furthermore, BASS identified a significant fraction of structurally diverse compounds with similar bioactivities, indicating its feasibility of "scaffold hopping" in searching novel molecules against the target of interest.
Project description:Over-regulation of Heme oxygenase 1 (HO-1) has been recently identified in many types of human cancer, and in these cases, poor clinical outcomes are normally reported. Indeed, the inhibition of HO-1 is being considered as an anticancer approach. Imidazole scaffold is normally present in most of the classical HO-1 inhibitors and seems indispensable to the inhibitory activity due to its strong interaction with the Fe(II) of the heme group. In this paper, we searched for new potentially HO-1 inhibitors among three different databases: Marine Natural Products (MNP), ZINC Natural Products (ZNP) and Super Natural II (SN2). 484,527 compounds were retrieved from the databases and filtered through four statistical/computational filters (2D descriptors, 2D-QSAR pharmacophoric model, 3D-QSAR pharmacophoric model, and docking). Different imidazole-based compounds were suggested by our methodology to be potentially active in inhibiting the HO-1, and the results have been rationalized by the bioactivity of the filtered molecules reported in the literature.
Project description:Trk receptor tyrosine kinases have been implicated in cancer and pain. A crystal structure of TrkA with AZ-23 (1a) was obtained, and scaffold hopping resulted in two 5/6-bicyclic series comprising either imidazo[4,5-b]pyridines or purines. Further optimization of these two fusion series led to compounds with subnanomolar potencies against TrkA kinase in cellular assays. Antitumor effects in a TrkA-driven mouse allograft model were demonstrated with compounds 2d and 3a.
Project description:Recently, we have released the de novo design platform REINVENT in version 2.0. This improved and extended iteration supports far more features and scoring function components, which allows bespoke and tailor-made protocols to maximize impact in small molecule drug discovery projects. A major obstacle of generative models is producing active compounds, in which predictive (QSAR) models have been applied to enrich target activity. However, QSAR models are inherently limited by their applicability domains. To overcome these limitations, we introduce a structure-based scoring component for REINVENT. DockStream is a flexible, stand-alone molecular docking wrapper that provides access to a collection of ligand embedders and docking backends. Using the benchmarking and analysis workflow provided in DockStream, execution and subsequent analysis of a variety of docking configurations can be automated. Docking algorithms vary greatly in performance depending on the target and the benchmarking and analysis workflow provides a streamlined solution to identifying productive docking configurations. We show that an informative docking configuration can inform the REINVENT agent to optimize towards improving docking scores using public data. With docking activated, REINVENT is able to retain key interactions in the binding site, discard molecules which do not fit the binding cavity, harness unused (sub-)pockets, and improve overall performance in the scaffold-hopping scenario. The code is freely available at https://github.com/MolecularAI/DockStream. <h4>Supplementary Information</h4> The online version contains supplementary material available at 10.1186/s13321-021-00563-7.
Project description:Affinity fingerprints report the activity of small molecules across a set of assays, and thus permit to gather information about the bioactivities of structurally dissimilar compounds, where models based on chemical structure alone are often limited, and model complex biological endpoints, such as human toxicity and in vitro cancer cell line sensitivity. Here, we propose to model in vitro compound activity using computationally predicted bioactivity profiles as compound descriptors. To this aim, we apply and validate a framework for the calculation of QSAR-derived affinity fingerprints (QAFFP) using a set of 1360 QSAR models generated using K<sub>i</sub>, K<sub>d</sub>, IC<sub>50</sub> and EC<sub>50</sub> data from ChEMBL database. QAFFP thus represent a method to encode and relate compounds on the basis of their similarity in bioactivity space. To benchmark the predictive power of QAFFP we assembled IC<sub>50</sub> data from ChEMBL database for 18 diverse cancer cell lines widely used in preclinical drug discovery, and 25 diverse protein target data sets. This study complements part 1 where the performance of QAFFP in similarity searching, scaffold hopping, and bioactivity classification is evaluated. Despite being inherently noisy, we show that using QAFFP as descriptors leads to errors in prediction on the test set in the ~?0.65-0.95 pIC<sub>50</sub> units range, which are comparable to the estimated uncertainty of bioactivity data in ChEMBL (0.76-1.00 pIC<sub>50</sub> units). We find that the predictive power of QAFFP is slightly worse than that of Morgan2 fingerprints and 1D and 2D physicochemical descriptors, with an effect size in the 0.02-0.08 pIC<sub>50</sub> units range. Including QSAR models with low predictive power in the generation of QAFFP does not lead to improved predictive power. Given that the QSAR models we used to compute the QAFFP were selected on the basis of data availability alone, we anticipate better modeling results for QAFFP generated using more diverse and biologically meaningful targets. Data sets and Python code are publicly available at https://github.com/isidroc/QAFFP_regression .
Project description:Two 3D quantitative structure?activity relationships (3D-QSAR) models for predicting Cannabinoid receptor 1 and 2 (CB? and CB?) ligands have been produced by way of creating a practical tool for the drug-design and optimization of CB? and CB? ligands. A set of 312 molecules have been used to build the model for the CB? receptor, and a set of 187 molecules for the CB? receptor. All of the molecules were recovered from the literature among those possessing measured <i>K</i><sub>i</sub> values, and Forge was used as software. The present model shows high and robust predictive potential, confirmed by the quality of the statistical analysis, and an adequate descriptive capability. A visual understanding of the hydrophobic, electrostatic, and shaping features highlighting the principal interactions for the CB? and CB? ligands was achieved with the construction of 3D maps. The predictive capabilities of the model were then used for a scaffold-hopping study of two selected compounds, with the generation of a library of new compounds with high affinity for the two receptors. Herein, we report two new 3D-QSAR models that comprehend a large number of chemically different CB? and CB? ligands and well account for the individual ligand affinities. These features will facilitate the recognition of new potent and selective molecules for CB? and CB? receptors.