Prediction of cancer drugs by chemical-chemical interactions.
ABSTRACT: Cancer, which is a leading cause of death worldwide, places a big burden on health-care system. In this study, an order-prediction model was built to predict a series of cancer drug indications based on chemical-chemical interactions. According to the confidence scores of their interactions, the order from the most likely cancer to the least one was obtained for each query drug. The 1(st) order prediction accuracy of the training dataset was 55.93%, evaluated by Jackknife test, while it was 55.56% and 59.09% on a validation test dataset and an independent test dataset, respectively. The proposed method outperformed a popular method based on molecular descriptors. Moreover, it was verified that some drugs were effective to the 'wrong' predicted indications, indicating that some 'wrong' drug indications were actually correct indications. Encouraged by the promising results, the method may become a useful tool to the prediction of drugs indications.
Project description:Discovering potential indications of novel or approved drugs is a key step in drug development. Previous computational approaches could be categorized into disease-centric and drug-centric based on the starting point of the issues or small-scaled application and large-scale application according to the diversity of the datasets. Here, a classifier has been constructed to predict the indications of a drug based on the assumption that interactive/associated drugs or drugs with similar structures are more likely to target the same diseases using a large drug indication dataset. To examine the classifier, it was conducted on a dataset with 1,573 drugs retrieved from Comprehensive Medicinal Chemistry database for five times, evaluated by 5-fold cross-validation, yielding five 1st order prediction accuracies that were all approximately 51.48%. Meanwhile, the model yielded an accuracy rate of 50.00% for the 1st order prediction by independent test on a dataset with 32 other drugs in which drug repositioning has been confirmed. Interestingly, some clinically repurposed drug indications that were not included in the datasets are successfully identified by our method. These results suggest that our method may become a useful tool to associate novel molecules with new indications or alternative indications with existing drugs.
Project description:Toxicity is a major contributor to high attrition rates of new chemical entities in drug discoveries. In this study, an order-classifier was built to predict a series of toxic effects based on data concerning chemical-chemical interactions under the assumption that interactive compounds are more likely to share similar toxicity profiles. According to their interaction confidence scores, the order from the most likely toxicity to the least was obtained for each compound. Ten test groups, each of them containing one training dataset and one test dataset, were constructed from a benchmark dataset consisting of 17,233 compounds. By a Jackknife test on each of these test groups, the 1(st) order prediction accuracies of the training dataset and the test dataset were all approximately 79.50%, substantially higher than the rate of 25.43% achieved by random guesses. Encouraged by the promising results, we expect that our method will become a useful tool in screening out drugs with high toxicity.
Project description:A drug side effect is an undesirable effect which occurs in addition to the intended therapeutic effect of the drug. The unexpected side effects that many patients suffer from are the major causes of large-scale drug withdrawal. To address the problem, it is highly demanded by pharmaceutical industries to develop computational methods for predicting the side effects of drugs. In this study, a novel computational method was developed to predict the side effects of drug compounds by hybridizing the chemical-chemical and protein-chemical interactions. Compared to most of the previous works, our method can rank the potential side effects for any query drug according to their predicted level of risk. A training dataset and test datasets were constructed from the benchmark dataset that contains 835 drug compounds to evaluate the method. By a jackknife test on the training dataset, the 1st order prediction accuracy was 86.30%, while it was 89.16% on the test dataset. It is expected that the new method may become a useful tool for drug design, and that the findings obtained by hybridizing various interactions in a network system may provide useful insights for conducting in-depth pharmacological research as well, particularly at the level of systems biomedicine.
Project description:BACKGROUND:Drug repositioning, also known as drug repurposing, defines new indications for existing drugs and can be used as an alternative to drug development. In recent years, the accumulation of large volumes of information related to drugs and diseases has led to the development of various computational approaches for drug repositioning. Although herbal medicines have had a great impact on current drug discovery, there are still a large number of herbal compounds that have no definite indications. RESULTS:In the present study, we constructed a computational model to predict the unknown pharmacological effects of herbal compounds using machine learning techniques. Based on the assumption that similar diseases can be treated with similar drugs, we used four categories of drug-drug similarity (e.g., chemical structure, side-effects, gene ontology, and targets) and three categories of disease-disease similarity (e.g., phenotypes, human phenotype ontology, and gene ontology). Then, associations between drug and disease were predicted using the employed similarity features. The prediction models were constructed using classification algorithms, including logistic regression, random forest and support vector machine algorithms. Upon cross-validation, the random forest approach showed the best performance (AUC?=?0.948) and also performed well in an external validation assessment using an unseen independent dataset (AUC?=?0.828). Finally, the constructed model was applied to predict potential indications for existing drugs and herbal compounds. As a result, new indications for 20 existing drugs and 31 herbal compounds were predicted and validated using clinical trial data. CONCLUSIONS:The predicted results were validated manually confirming the performance and underlying mechanisms - for example, irinotecan as a treatment for neuroblastoma. From the prediction, herbal compounds were considered to be drug candidates for related diseases which is important to be further developed. The proposed prediction model can contribute to drug discovery by suggesting drug candidates from herbal compounds which have potentials but few were studied.
Project description:The network structure of biological systems suggests that effective therapeutic intervention may require combinations of agents that act synergistically. However, a dearth of systematic chemical combination datasets have limited the development of predictive algorithms for chemical synergism. Here, we report two large datasets of linked chemical-genetic and chemical-chemical interactions in the budding yeast Saccharomyces cerevisiae. We screened 5,518 unique compounds against 242 diverse yeast gene deletion strains to generate an extended chemical-genetic matrix (CGM) of 492,126 chemical-gene interaction measurements. This CGM dataset contained 1,434 genotype-specific inhibitors, termed cryptagens. We selected 128 structurally diverse cryptagens and tested all pairwise combinations to generate a benchmark dataset of 8,128 pairwise chemical-chemical interaction tests for synergy prediction, termed the cryptagen matrix (CM). An accompanying database resource called ChemGRID was developed to enable analysis, visualisation and downloads of all data. The CGM and CM datasets will facilitate the benchmarking of computational approaches for synergy prediction, as well as chemical structure-activity relationship models for anti-fungal drug discovery.
Project description:Comparative analyses of chromosomal gene orders are successfully used to predict gene clusters in bacterial and fungal genomes. Present models for detecting sets of co-localized genes in chromosomal sequences require prior knowledge of gene family assignments of genes in the dataset of interest. These families are often computationally predicted on the basis of sequence similarity or higher order features of gene products. Errors introduced in this process amplify in subsequent gene order analyses and thus may deteriorate gene cluster prediction.In this work, we present a new dynamic model and efficient computational approaches for gene cluster prediction suitable in scenarios ranging from traditional gene family-based gene cluster prediction, via multiple conflicting gene family annotations, to gene family-free analysis, in which gene clusters are predicted solely on the basis of a pairwise similarity measure of the genes of different genomes. We evaluate our gene family-free model against a gene family-based model on a dataset of 93 bacterial genomes.Our model is able to detect gene clusters that would be also detected with well-established gene family-based approaches. Moreover, we show that it is able to detect conserved regions which are missed by gene family-based methods due to wrong or deficient gene family assignments.
Project description:Natural products have been an important source of lead compounds for drug discovery. How to find and evaluate bioactive natural products is critical to the achievement of drug/lead discovery from natural products.We collected 19,7201 natural products structures, reported biological activities and virtual screening results. Principal component analysis was employed to explore the chemical space, and we found that there was a large portion of overlap between natural products and FDA-approved drugs in the chemical space, which indicated that natural products had large quantity of potential lead compounds. We also explored the network properties of natural product-target networks and found that polypharmacology was greatly enriched to those compounds with large degree and high betweenness centrality. In order to make up for a lack of experimental data, high throughput virtual screening was employed. All natural products were docked to 332 target proteins of FDA-approved drugs. The most potential natural products for drug discovery and their indications were predicted based on a docking score-weighted prediction model.Analysis of molecular descriptors, distribution in chemical space and biological activities of natural products was conducted in this article. Natural products have vast chemical diversity, good drug-like properties and can interact with multiple cellular target proteins.
Project description:Blood-Brain-Barrier (BBB) is a rigorous permeability barrier for maintaining homeostasis of Central Nervous System (CNS). Determination of compound's permeability to BBB is prerequisite in CNS drug discovery. Existing computational methods usually predict drug BBB permeability from chemical structure and they generally apply to small compounds passing BBB through passive diffusion. As abundant information on drug side effects and indications has been recorded over time through extensive clinical usage, we aim to explore BBB permeability prediction from a new angle and introduce a novel approach to predict BBB permeability from drug clinical phenotypes (drug side effects and drug indications). This method can apply to both small compounds and macro-molecules penetrating BBB through various mechanisms besides passive diffusion.We composed a training dataset of 213 drugs with known brain and blood steady-state concentrations ratio and extracted their side effects and indications as features. Next, we trained SVM models with polynomial kernel and obtained accuracy of 76.0%, AUC 0.739, and F 1 score (macro weighted) 0.760 with Monte Carlo cross validation. The independent test accuracy was 68.3%, AUC 0.692, F 1 score 0.676. When both chemical features and clinical phenotypes were available, combining the two types of features achieved significantly better performance than chemical feature based approach (accuracy 85.5% versus 72.9%, AUC 0.854 versus 0.733, F 1 score 0.854 versus 0.725; P ?<?e -90 ). We also conducted de novo prediction and identified 110 drugs in SIDER database having the potential to penetrate BBB, which could serve as start point for CNS drug repositioning research.https://github.com/bioinformatics-gao/CASE-BBB-prediction-Data.firstname.lastname@example.org.Supplementary data are available at Bioinformatics online.
Project description:The focus of studies on second-order false belief reasoning generally was on investigating the roles of executive functions and language with correlational studies. Different from those studies, we focus on the question how 5-year-olds select and revise reasoning strategies in second-order false belief tasks by constructing two computational cognitive models of this process: an instance-based learning model and a reinforcement learning model. Unlike the reinforcement learning model, the instance-based learning model predicted that children who fail second-order false belief tasks would give answers based on first-order theory of mind (ToM) reasoning as opposed to zero-order reasoning. This prediction was confirmed with an empirical study that we conducted with 72 5- to 6-year-old children. The results showed that 17% of the answers were correct and 83% of the answers were wrong. In line with our prediction, 65% of the wrong answers were based on a first-order ToM strategy, while only 29% of them were based on a zero-order strategy (the remaining 6% of subjects did not provide any answer). Based on our instance-based learning model, we propose that when children get feedback "Wrong," they explicitly revise their strategy to a higher level instead of implicitly selecting one of the available ToM strategies. Moreover, we predict that children's failures are due to lack of experience and that with exposure to second-order false belief reasoning, children can revise their wrong first-order reasoning strategy to a correct second-order reasoning strategy.
Project description:Oral bioavailability is a key consideration in development of drug products, and the use of preclinical species in predicting bioavailability in human has long been debated. In order to clarify whether any correlation between human and animal bioavailability exist, an extensive analysis of the published literature data was conducted. Due to the complex nature of bioavailability calculations inclusion criteria were applied to ensure integrity of the data. A database of 184 compounds was assembled. Linear regression for the reported compounds indicated no strong or predictive correlations to human data for all species, individually and combined. The lack of correlation in this extended dataset highlights that animal bioavailability is not quantitatively predictive of bioavailability in human. Although qualitative (high/low bioavailability) indications might be possible, models taking into account species-specific factors that may affect bioavailability are recommended for developing quantitative prediction.