Project description:A total of 42 trisubstituted carboranes categorised into five scaffolds were systematically designed and synthesized by exploiting the different reactivities of the twelve vertices of o-, m-, and p-carboranes to cover all directions in chemical space. Significant inhibitors of hypoxia inducible factor transcriptional activitay were mainly observed among scaffold V compounds (e.g., Vi-m, and Vo), whereas anti-rabies virus activity was observed among scaffold V (Va-h), scaffold II (IIb-g), and scaffold IV (IVb) compounds. The pharmacophore model predicted from compounds with scaffold V, which exhibited significant anti-rabies virus activity, agreed well with compounds IIb-g with scaffold II and compound IVb with scaffold IV. Normalized principal moment of inertia analysis indicated that carboranes with scaffolds I-V cover all regions in the chemical space. Furthermore, the first compounds shown to stimulate the proliferation of the rabies virus were found among scaffold V carboranes.
Project description:Tuberculosis (TB), entrained by Mycobacterium tuberculosis, continues to be an enfeebling disease, killing nearly 1.5 million people in 2019, with 2 billion people worldwide affected by latent TB. The multidrug-resistant and totally drug-resistant emerging strains further exacerbate the TB infection. The cell wall of bacteria provides critical virulence components such as cell surface proteins, regulators, signal transduction proteins, and toxins. The cell wall biosynthesis pathway of Mycobacterium tuberculosis is exhaustively studied to discover novel drug targets. Decaprenylphosphoryl-β-d-ribose-2'-epimerase (DprE1) is an important enzyme involved in the arabinogalactan biosynthetic pathway of Mycobacterium tuberculosis cell wall and is essential for both latent and persistent bacterial infection. We analyzed all known ∼1300 DprE1 inhibitors to gain deep insights into the chemogenomic space of DprE1-ligand complexes. Physicochemical descriptors of the DprE1 inhibitors showed a marked lipophilic character forming a cluster distinct from the existing TB drugs, as revealed by the principal component analysis. Similarity analysis using Murcko scaffolds and rubber band scaling revealed scarce representation of the chemical space. Further, Murcko scaffold analysis uncovered favorable and unfavorable scaffolds, where benzo and pyridine-based core scaffolds exhibit the highest biological activity, as evidenced by their MIC and IC50 values. Automatic SAR and R-group decomposition analysis resulted in the identification of substructures responsible for the inhibitory activity of the DprE1 enzyme. Further, with activity cliff analysis, we observed prominent discontinuity in the SAR of DprE1 inhibitors, where even simple structural modification in the chemical scaffold resulted in significant potency difference, presumably due to the binding orientation and interaction in the active site. Thiophene, 6-membered aromatic rings, and unsubstituted benzene ring-based toxicophores were identified in the DprE1 chemical space using an artificial intelligence approach based on inductive logic programming. This paper, hence, ushers in new insights for the design and development of potent covalent and non-covalent DprE1 inhibitors and guides hit and lead optimization for the development of non-hazardous small molecule therapeutics for Mycobacterium tuberculosis.
Project description:Pharmit (http://pharmit.csb.pitt.edu) provides an online, interactive environment for the virtual screening of large compound databases using pharmacophores, molecular shape and energy minimization. Users can import, create and edit virtual screening queries in an interactive browser-based interface. Queries are specified in terms of a pharmacophore, a spatial arrangement of the essential features of an interaction, and molecular shape. Search results can be further ranked and filtered using energy minimization. In addition to a number of pre-built databases of popular compound libraries, users may submit their own compound libraries for screening. Pharmit uses state-of-the-art sub-linear algorithms to provide interactive screening of millions of compounds. Queries typically take a few seconds to a few minutes depending on their complexity. This allows users to iteratively refine their search during a single session. The easy access to large chemical datasets provided by Pharmit simplifies and accelerates structure-based drug design. Pharmit is available under a dual BSD/GPL open-source license.
Project description:The vast size of composition space poses a significant challenge for materials chemistry: exhaustive enumeration of potentially interesting compositions is typically infeasible, hindering assessment of important criteria ranging from novelty and stability to cost and performance. We report a tool, Comgen, for the efficient exploration of composition space, which makes use of logical methods from computer science used for proving theorems. We demonstrate how these techniques, which have not previously been applied to materials discovery, can enable reasoning about scientific domain knowledge provided by human experts. Comgen accepts a variety of user-specified criteria, converts these into an abstract form, and utilises a powerful automated reasoning algorithm to identify compositions that satisfy these user requirements, or prove that the requirements cannot be simultaneously satisfied. In contrast to machine learning techniques, explicitly reasoning about domain knowledge, rather than making inferences from data, ensures that Comgen's outputs are fully interpretable and provably correct. Users interact with Comgen through a high-level Python interface. We illustrate use of the tool with several case studies focused on the search for new ionic conductors. Further, we demonstrate the integration of Comgen into an end-to-end automated workflow to propose and evaluate candidate compositions quantitatively, prior to experimental investigation. This highlights the potential of automated formal reasoning in materials chemistry.
Project description:Natural products (NPs) are a rich source of novel compound classes and new drugs. In the present study we have used the chemical space navigation tool ChemGPS-NP to evaluate the chemical space occupancy by NPs and bioactive medicinal chemistry compounds from the database WOMBAT. The two sets differ notably in coverage of chemical space, and tangible leadlike NPs were found to cover regions of chemical space that lack representation in WOMBAT. Property based similarity calculations were performed to identify NP neighbors of approved drugs. Several of the NPs revealed by this method were confirmed to exhibit the same activity as their drug neighbors. The identification of leads from a NP starting point may prove a useful strategy for drug discovery in the search for novel leads with unique properties.
Project description:We present a volume exploration framework, FeatureLego, that uses a novel voxel clustering approach for efficient selection of semantic features. We partition the input volume into a set of compact super-voxels that represent the finest selection granularity. We then perform an exhaustive clustering of these super-voxels using a graph-based clustering method. Unlike the prevalent brute-force parameter sampling approaches, we propose an efficient algorithm to perform this exhaustive clustering. By computing an exhaustive set of clusters, we aim to capture as many boundaries as possible and ensure that the user has sufficient options for efficiently selecting semantically relevant features. Furthermore, we merge all the computed clusters into a single tree of meta-clusters that can be used for hierarchical exploration. We implement an intuitive user-interface to interactively explore volumes using our clustering approach. Finally, we show the effectiveness of our framework on multiple real-world datasets of different modalities.
Project description:A parametric t-SNE approach based on deep feed-forward neural networks was applied to the chemical space visualization problem. It is able to retain more information than certain dimensionality reduction techniques used for this purpose (principal component analysis (PCA), multidimensional scaling (MDS)). The applicability of this method to some chemical space navigation tasks (activity cliffs and activity landscapes identification) is discussed. We created a simple web tool to illustrate our work (http://space.syntelly.com).
Project description:Chemical space exploration is a major task of the hit-finding process during the pursuit of novel chemical entities. Compared with other screening technologies, computational de novo design has become a popular approach to overcome the limitation of current chemical libraries. Here, we reported a de novo design platform named systemic evolutionary chemical space explorer (SECSE). The platform was conceptually inspired by fragment-based drug design, that miniaturized a "lego-building" process within the pocket of a certain target. The key to virtual hits generation was then turned into a computational search problem. To enhance search and optimization, human intelligence and deep learning were integrated. Application of SECSE against phosphoglycerate dehydrogenase (PHGDH), proved its potential in finding novel and diverse small molecules that are attractive starting points for further validation. This platform is open-sourced and the code is available at http://github.com/KeenThera/SECSE.
Project description:BACKGROUND:Chemical space is virtual space occupied by all chemically meaningful organic compounds. It is an important concept in contemporary chemoinformatics research, and its systematic exploration is vital to the discovery of either novel drugs or new tools for chemical biology. RESULTS:In this paper, we describe Molpher, an open-source framework for the systematic exploration of chemical space. Through a process we term 'molecular morphing', Molpher produces a path of structurally-related compounds. This path is generated by the iterative application of so-called 'morphing operators' that represent simple structural changes, such as the addition or removal of an atom or a bond. Molpher incorporates an optimized parallel exploration algorithm, compound logging and a two-dimensional visualization of the exploration process. Its feature set can be easily extended by implementing additional morphing operators, chemical fingerprints, similarity measures and visualization methods. Molpher not only offers an intuitive graphical user interface, but also can be run in batch mode. This enables users to easily incorporate molecular morphing into their existing drug discovery pipelines. CONCLUSIONS:Molpher is an open-source software framework for the design of virtual chemical libraries focused on a particular mechanistic class of compounds. These libraries, represented by a morphing path and its surroundings, provide valuable starting data for future in silico and in vitro experiments. Molpher is highly extensible and can be easily incorporated into any existing computational drug design pipeline.
Project description:Drug discovery can be thought of as a search for a needle in a haystack: searching through a large chemical space for the most active compounds. Computational techniques can narrow the search space for experimental follow up, but even they become unaffordable when evaluating large numbers of molecules. Therefore, machine learning (ML) strategies are being developed as computationally cheaper complementary techniques for navigating and triaging large chemical libraries. Here, we explore how an active learning protocol can be combined with first-principles based alchemical free energy calculations to identify high affinity phosphodiesterase 2 (PDE2) inhibitors. We first calibrate the procedure using a set of experimentally characterized PDE2 binders. The optimized protocol is then used prospectively on a large chemical library to navigate toward potent inhibitors. In the active learning cycle, at every iteration a small fraction of compounds is probed by alchemical calculations and the obtained affinities are used to train ML models. With successive rounds, high affinity binders are identified by explicitly evaluating only a small subset of compounds in a large chemical library, thus providing an efficient protocol that robustly identifies a large fraction of true positives.