EpiDBase: a manually curated database for small molecule modulators of epigenetic landscape.
ABSTRACT: We have developed EpiDBase (www.epidbase.org), an interactive database of small molecule ligands of epigenetic protein families by bringing together experimental, structural and chemoinformatic data in one place. Currently, EpiDBase encompasses 5784 unique ligands (11?422 entries) of various epigenetic markers such as writers, erasers and readers. The EpiDBase includes experimental IC(50) values, ligand molecular weight, hydrogen bond donor and acceptor count, XlogP, number of rotatable bonds, number of aromatic rings, InChIKey, two-dimensional and three-dimensional (3D) chemical structures. A catalog of all epidbase ligands based on the molecular weight is also provided. A structure editor is provided for 3D visualization of ligands. EpiDBase is integrated with tools like text search, disease-specific search, advanced search, substructure, and similarity analysis. Advanced analysis can be performed using substructure and OpenBabel-based chemical similarity fingerprints. The EpiDBase is curated to identify unique molecular scaffolds. Initially, molecules were selected by removing peptides, macrocycles and other complex structures and then processed for conformational sampling by generating 3D conformers. Subsequent filtering through Zinc Is Not Commercial (ZINC: a free database of commercially available compounds for virtual screening) and Lilly MedChem regular rules retained many distinctive drug-like molecules. These molecules were then analyzed for physicochemical properties using OpenBabel descriptors and clustered using various methods such as hierarchical clustering, binning partition and multidimensional scaling. EpiDBase provides comprehensive resources for further design, development and refinement of small molecule modulators of epigenetic markers.
Project description:Identifying and purchasing new small molecules to test in biological assays are enabling for ligand discovery, but as purchasable chemical space continues to grow into the tens of billions based on inexpensive make-on-demand compounds, simply searching this space becomes a major challenge. We have therefore developed ZINC20, a new version of ZINC with two major new features: billions of new molecules and new methods to search them. As a fully enumerated database, ZINC can be searched precisely using explicit atomic-level graph-based methods, such as SmallWorld for similarity and Arthor for pattern and substructure search, as well as 3D methods such as docking. Analysis of the new make-on-demand compound sets by these and related tools reveals startling features. For instance, over 97% of the core Bemis-Murcko scaffolds in make-on-demand libraries are unavailable from "in-stock" collections. Correspondingly, the number of new Bemis-Murcko scaffolds is rising almost as a linear fraction of the elaborated molecules. Thus, an 88-fold increase in the number of molecules in the make-on-demand versus the in-stock sets is built upon a 16-fold increase in the number of Bemis-Murcko scaffolds. The make-on-demand library is also more structurally diverse than physical libraries, with a massive increase in disc- and sphere-like shaped molecules. The new system is freely available at zinc20.docking.org.
Project description:Building 3-dimensional (3D) molecules is the starting point in molecular modeling. Conformer search and identification of a global energy minimum structure are often performed computationally during spectral analysis of data from NMR, IR, and VCD or during rational drug design through ligand-based, structure-based, and QSAR approaches. I herein report a convenient script that allows for automated building of 3D structures and conformer searching from 2-dimensional (2D) drawing of chemical structures. With this Bash shell script, which runs on Mac OS X and the Linux platform, the tasks are consecutively and iteratively executed without a 3D molecule builder via the command line interface of the free (academic) software OpenBabel, Balloon, and MOPAC2012. A large number of 2D chemical drawing files can be processed simultaneously, and the script functions with stereoisomers. Semi-empirical quantum chemical calculation ensures reliable ranking of the generated conformers on the basis of energy. In addition to an energy-sorted list of file names of the conformers, their Gaussian input files are provided for ab initio and density functional theory calculations to predict rigorous electronic energies, structures, and properties. This script is freely available to all scientists.
Project description:BACKGROUND: Drug discovery and chemical biology are exceedingly complex and demanding enterprises. In recent years there are been increasing awareness about the importance of predicting/optimizing the absorption, distribution, metabolism, excretion and toxicity (ADMET) properties of small chemical compounds along the search process rather than at the final stages. Fast methods for evaluating ADMET properties of small molecules often involve applying a set of simple empirical rules (educated guesses) and as such, compound collections' property profiling can be performed in silico. Clearly, these rules cannot assess the full complexity of the human body but can provide valuable information and assist decision-making. RESULTS: This paper presents FAF-Drugs2, a free adaptable tool for ADMET filtering of electronic compound collections. FAF-Drugs2 is a command line utility program (e.g., written in Python) based on the open source chemistry toolkit OpenBabel, which performs various physicochemical calculations, identifies key functional groups, some toxic and unstable molecules/functional groups. In addition to filtered collections, FAF-Drugs2 can provide, via Gnuplot, several distribution diagrams of major physicochemical properties of the screened compound libraries. CONCLUSION: We have developed FAF-Drugs2 to facilitate compound collection preparation, prior to (or after) experimental screening or virtual screening computations. Users can select to apply various filtering thresholds and add rules as needed for a given project. As it stands, FAF-Drugs2 implements numerous filtering rules (23 physicochemical rules and 204 substructure searching rules) that can be easily tuned.
Project description:<h4>Unlabelled</h4><h4>Background</h4>Searching for substructures in molecules belongs to the most elementary tasks in cheminformatics and is nowadays part of virtually every cheminformatics software. The underlying algorithms, used over several decades, are designed for the application to general graphs. Applied on molecular graphs, little effort has been spend on characterizing their performance. Therefore, it is not clear how current substructure search algorithms behave on such special graphs. One of the main reasons why such an evaluation was not performed in the past was the absence of appropriate data sets.<h4>Results</h4>In this paper, we present a systematic evaluation of Ullmann's and the VF2 subgraph isomorphism algorithms on molecular data. The benchmark set consists of a collection of 1235 SMARTS substructure expressions and selected molecules from the ZINC database. The benchmark evaluates substructures search times for complete database scans as well as individual substructure-molecule pairs. In detail, we focus on the influence of substructure formulation and size, the impact of molecule size, and the ability of both algorithms to be used on multiple cores.<h4>Conclusions</h4>The results show a clear superiority of the VF2 algorithm in all test scenarios. In general, both algorithms solve most instances in less than one millisecond, which we consider to be acceptable. Still, in direct comparison, the VF2 is most often several folds faster than Ullmann's algorithm. Additionally, Ullmann's algorithm shows a surprising number of run time outliers.
Project description:BACKGROUND: Scripting languages such as Python are ideally suited to common programming tasks in cheminformatics such as data analysis and parsing information from files. However, for reasons of efficiency, cheminformatics toolkits such as the OpenBabel toolkit are often implemented in compiled languages such as C++. We describe Pybel, a Python module that provides access to the OpenBabel toolkit. RESULTS: Pybel wraps the direct toolkit bindings to simplify common tasks such as reading and writing molecular files and calculating fingerprints. Extensive use is made of Python iterators to simplify loops such as that over all the molecules in a file. A Pybel Molecule can be easily interconverted to an OpenBabel OBMol to access those methods or attributes not wrapped by Pybel. CONCLUSION: Pybel allows cheminformaticians to rapidly develop Python scripts that manipulate chemical information. It is open source, available cross-platform, and offers the power of the OpenBabel toolkit to Python programmers.
Project description:R3D-BLAST is a BLAST-like search tool that allows the user to quickly and accurately search against the PDB for RNA structures sharing similar substructures with a specified query RNA structure. The basic idea behind R3D-BLAST is that all the RNA 3D structures deposited in the PDB are first encoded as 1D structural sequences using a structural alphabet of 23 distinct nucleotide conformations, and BLAST is then applied to these 1D structural sequences to search for those RNA substructures whose 1D structural sequences are similar to that of the query RNA substructure. R3D-BLAST takes as input an RNA 3D structure in the PDB format and outputs all substructures of the hits similar to that of the query with a graphical display to show their structural superposition. In addition, each RNA substructure hit found by R3D-BLAST has an associated E-value to measure its statistical significance. R3D-BLAST is now available online at http://genome.cs.nthu.edu.tw/R3D-BLAST/ for public access.
Project description:We present a novel highly efficient method for the detection of a pharmacophore from a set of drug-like ligands that interact with a target receptor. A pharmacophore is a spatial arrangement of physico-chemical features in a ligand that is essential for the interaction with a specific receptor. In the absence of a known three-dimensional (3D) receptor structure, a pharmacophore can be identified from a multiple structural alignment of ligand molecules. The key advantages of the presented algorithm are: (a) its ability to multiply align flexible ligands in a deterministic manner, (b) its ability to focus on subsets of the input ligands, which may share a large common substructure, resulting in the detection of both outlier molecules and alternative binding modes, and (c) its computational efficiency, which allows to detect pharmacophores shared by a large number of molecules on a standard PC. The algorithm was extensively tested on a dataset of almost 80 ligands acting on 12 different receptors. The results, which were achieved using a set of standard default parameters, were consistent with reference pharmacophores that were derived from the bound ligand-receptor complexes. The pharmacophores detected by the algorithm are expected to be a key component in the discovery of new leads by screening large databases of drug-like molecules. A user-friendly web interface is available at http://bioinfo3d.cs.tau.ac.il/pharma. Supplementary material can be found at http://bioinfo3d.cs.tau.ac.il/pharma/reduction/.
Project description:BACKGROUND: Virtual screening methods are now well established as effective to identify hit and lead candidates and are fully integrated in most drug discovery programs. Ligand-based approaches make use of physico-chemical, structural and energetics properties of known active compounds to search large chemical libraries for related and novel chemotypes. While 2D-similarity search tools are known to be fast and efficient, the use of 3D-similarity search methods can be very valuable to many research projects as integration of "3D knowledge" can facilitate the identification of not only related molecules but also of chemicals possessing distant scaffolds as compared to the query and therefore be more inclined to scaffolds hopping. To date, very few methods performing this task are easily available to the scientific community. RESULTS: We introduce a new approach (LigCSRre) to the 3D ligand similarity search of drug candidates. It combines a 3D maximum common substructure search algorithm independent on atom order with a tunable description of atomic compatibilities to prune the search and increase its physico-chemical relevance. We show, on 47 experimentally validated active compounds across five protein targets having different specificities, that for single compound search, the approach is able to recover on average 52% of the co-actives in the top 1% of the ranked list which is better than gold standards of the field. Moreover, the combination of several runs on a single protein target using different query active compounds shows a remarkable improvement in enrichment. Such Results demonstrate LigCSRre as a valuable tool for ligand-based screening. CONCLUSION: LigCSRre constitutes a new efficient and generic approach to the 3D similarity screening of small compounds, whose flexible design opens the door to many enhancements. The program is freely available to the academics for non-profit research at: http://bioserv.rpbs.univ-paris-diderot.fr/LigCSRre.html.
Project description:Nitrogen-based groups are usually not used as ligands to coordinate to the ptC atom. However, here we reported only nitrogen-based ligands to accomplish a theoretically successful square planar C(N)<sub>4</sub> substructure. The first difficulty in accomplishing a square ptC(N)<sub>4</sub> substructure is to conquer the tremendous strain from the planar to tetrahedral arrangements, and the second is to restrict it in a suitable system with the right symmetry. We designed several neutral molecules with the square ptC(N)<sub>4</sub> substructures, and the molecules were studied using the density functional theory method at the B3LYP/6-311++G(3df,3pd) and TPSSh/6-311++G(3df,3pd) level of theory. The results of this work show that the molecules are all real minima on the potential energy surface and successfully achieved the square ptC(N)<sub>4</sub> substructure in the theoretical method. The group orbitals among the square ptC(N)<sub>4</sub> arrangement in the <i>D</i> <sub>2<i>d</i></sub> symmetry have been discussed and used to investigate the bonding interactions among all atoms in the square ptC(N)<sub>4</sub> substructure. Usually, the ptC systems have 18 valence electrons, but the present ptC systems mentioned in this work have 24 valence electrons, which is unusual for ptC.
Project description:The acid dissociation constant is an important molecular property, and it can be successfully predicted by Quantitative Structure-Property Relationship (QSPR) models, even for in silico designed molecules. We analyzed how the methodology of in silico 3D structure preparation influences the quality of QSPR models. Specifically, we evaluated and compared QSPR models based on six different 3D structure sources (DTP NCI, Pubchem, Balloon, Frog2, OpenBabel, and RDKit) combined with four different types of optimization. These analyses were performed for three classes of molecules (phenols, carboxylic acids, anilines), and the QSPR model descriptors were quantum mechanical (QM) and empirical partial atomic charges. Specifically, we developed 516 QSPR models and afterward systematically analyzed the influence of the 3D structure source and other factors on their quality. Our results confirmed that QSPR models based on partial atomic charges are able to predict pKa with high accuracy. We also confirmed that ab initio and semiempirical QM charges provide very accurate QSPR models and using empirical charges based on electronegativity equalization is also acceptable, as well as advantageous, because their calculation is very fast. On the other hand, Gasteiger-Marsili empirical charges are not applicable for pKa prediction. We later found that QSPR models for some classes of molecules (carboxylic acids) are less accurate. In this context, we compared the influence of different 3D structure sources. We found that an appropriate selection of 3D structure source and optimization method is essential for the successful QSPR modeling of pKa. Specifically, the 3D structures from the DTP NCI and Pubchem databases performed the best, as they provided very accurate QSPR models for all the tested molecular classes and charge calculation approaches, and they do not require optimization. Also, Frog2 performed very well. Other 3D structure sources can also be used but are not so robust, and an unfortunate combination of molecular class and charge calculation approach can produce weak QSPR models. Additionally, these 3D structures generally need optimization in order to produce good quality QSPR models.