Project description:Increasing age is a risk factor for many diseases; therefore developing pharmacological interventions that slow down ageing and consequently postpone the onset of many age-related diseases is highly desirable. In this work we analyse data from the DrugAge database, which contains chemical compounds and their effect on the lifespan of model organisms. Predictive models were built using the machine learning method random forests to predict whether or not a chemical compound will increase Caenorhabditis elegans' lifespan, using as features Gene Ontology (GO) terms annotated for proteins targeted by the compounds and chemical descriptors calculated from each compound's chemical structure. The model with the best predictive accuracy used both biological and chemical features, achieving a prediction accuracy of 80%. The top 20 most important GO terms include those related to mitochondrial processes, to enzymatic and immunological processes, and terms related to metabolic and transport processes. We applied our best model to predict compounds which are more likely to increase C. elegans' lifespan in the DGIdb database, where the effect of the compounds on an organism's lifespan is unknown. The top hit compounds can be broadly divided into four groups: compounds affecting mitochondria, compounds for cancer treatment, anti-inflammatories, and compounds for gonadotropin-releasing hormone therapies.
Project description:There is an urgent need for the identification of effective therapeutics for COVID-19 and we have developed a machine learning drug discovery pipeline to identify several drug candidates. First, we collect assay data for 65 target human proteins known to interact with the SARS-CoV-2 proteins, including the ACE2 receptor. Next, we train machine learning models to predict inhibitory activity and use them to screen FDA registered chemicals and approved drugs (~100,000) and ~14 million purchasable chemicals. We filter predictions according to estimated mammalian toxicity and vapor pressure. Prospective volatile candidates are proposed as novel inhaled therapeutics since the nasal cavity and respiratory tracts are early bottlenecks for infection. We also identify candidates that act across multiple targets as promising for future analyses. We anticipate that this theoretical study can accelerate testing of two categories of therapeutics: repurposed drugs suited for short-term approval, and novel efficacious drugs suitable for a long-term follow up.
Project description:With the rapidly evolving SARS-CoV-2 variants of concern, there is an urgent need for the discovery of further treatments for the coronavirus disease (COVID-19). Drug repurposing is one of the most rapid strategies for addressing this need, and numerous compounds have already been selected for in vitro testing by several groups. These have led to a growing database of molecules with in vitro activity against the virus. Machine learning models can assist drug discovery through prediction of the best compounds based on previously published data. Herein, we have implemented several machine learning methods to develop predictive models from recent SARS-CoV-2 in vitro inhibition data and used them to prioritize additional FDA-approved compounds for in vitro testing selected from our in-house compound library. From the compounds predicted with a Bayesian machine learning model, lumefantrine, an antimalarial was selected for testing and showed limited antiviral activity in cell-based assays while demonstrating binding (Kd 259 nM) to the spike protein using microscale thermophoresis. Several other compounds which we prioritized have since been tested by others and were also found to be active in vitro. This combined machine learning and in vitro testing approach can be expanded to virtually screen available molecules with predicted activity against SARS-CoV-2 reference WIV04 strain and circulating variants of concern. In the process of this work, we have created multiple iterations of machine learning models that can be used as a prioritization tool for SARS-CoV-2 antiviral drug discovery programs. The very latest model for SARS-CoV-2 with over 500 compounds is now freely available at www.assaycentral.org.
Project description:Oxidative stress is a well-established risk factor for numerous chronic diseases, emphasizing the need for efficient identification of potent antioxidants. Conventional methods for assessing antioxidant properties are often time-consuming and resource-intensive, typically relying on laborious biochemical assays. In this study, we investigated the applicability of machine learning (ML) algorithms for predicting the antioxidant activity of compounds based solely on their molecular structure. We evaluated the performance of five ML algorithms, Support Vector Machine (SVM), Logistic Regression (LR), XGBoost, Random Forest (RF), and Deep Neural Network (DNN), using a dataset of over 1,900 compounds with experimentally determined antioxidant activity. Both RF and SVM achieved the best overall performance, exhibiting high accuracy (> 0.9) and effectively distinguishing active and inactive compounds with high structural similarity. External validation using natural product data from the BATMAN database confirmed the generalizability of the RF and SVM models. Our results suggest that ML models serve as powerful tools to expedite the discovery of novel antioxidant candidates, potentially streamlining the development of future therapeutic interventions.
Project description:High drug development costs and the limited number of new annual drug approvals increase the need for innovative approaches for drug effect prediction. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the cause of coronavirus disease 2019 (COVID-19), led to a global pandemic with high morbidity and mortality. Although effective preventive measures exist, there are few effective treatments for hospitalized patients with SARS-CoV-2 infection. Drug repurposing and drug effect prediction are promising strategies that could shorten development time and reduce costs compared with de novo drug discovery. In this work, we present a machine learning framework to integrate a variety of target network features and physicochemical properties of compounds, and analyze their influence on the therapeutic effects for SARS-CoV-2 infection and on host cell cytotoxic effects. Random forest models trained on compounds with known experimental effects on SARS-CoV-2 infection and subsequent feature importance analysis based on Shapley values provided insights into the determinants of drug efficacy and cytotoxicity, which can be incorporated into novel drug discovery approaches. Given the complexity of molecular mechanisms of drug action and limited sample sizes, our models achieve a reasonable mean area under the receiver operating characteristic curve (ROC-AUC) of 0.73 on an unseen validation set. To our knowledge, this is the first work to incorporate a combination of network and physicochemical features of compounds into a machine learning model to predict drug effects on SARS-CoV-2 infection. Our systems pharmacology-based machine learning framework can be used to classify other existing drugs for SARS-CoV-2 infection and can easily be adapted to drug effect prediction for future viral outbreaks.
Project description:We have trained the Extreme Minimum Learning Machine (EMLM) machine learning model to predict chemical potentials of individual conformers of multifunctional organic compounds containing carbon, hydrogen, and oxygen. The model is able to predict chemical potentials of molecules that are in the size range of the training data with a root-mean-square error (RMSE) of 0.5 kcal/mol. There is also a linear correlation between calculated and predicted chemical potentials of molecules that are larger than those included in the training set. Finding the lowest chemical potential conformers is useful in condensed phase thermodynamic property calculations, in order to reduce the number of computationally demanding density functional theory calculations.
Project description:COVID-19 caused by the SARS-CoV-2 is a current global challenge and urgent discovery of potential drugs to combat this pandemic is a need of the hour. 3-chymotrypsin-like cysteine protease (3CLpro) enzyme is the vital molecular target against the SARS-CoV-2. Therefore, in the present study, 1528 anti-HIV1compounds were screened by sequence alignment between 3CLpro of SARS-CoV-2 and avian infectious bronchitis virus (avian coronavirus) followed by machine learning predictive model, drug-likeness screening and molecular docking, which resulted in 41 screened compounds. These 41 compounds were re-screened by deep learning model constructed considering the IC50 values of known inhibitors which resulted in 22 hit compounds. Further, screening was done by structural activity relationship mapping which resulted in two structural clefts. Thereafter, functional group analysis was also done, where cluster 2 showed the presence of several essential functional groups having pharmacological importance. In the final stage, Cluster 2 compounds were re-docked with four different PDB structures of 3CLpro, and their depth interaction profile was analyzed followed by molecular dynamics simulation at 100 ns. Conclusively, 2 out of 1528 compounds were screened as potential hits against 3CLpro which could be further treated as an excellent drug against SARS-CoV-2.
Project description:IntroductionAlthough clinical, functional, and biomarker data predict asthma exacerbations, newer approaches providing high accuracy of prognosis are needed for real-world decision-making in asthma. Machine learning (ML) leverages mathematical and statistical methods to detect patterns for future disease events across large datasets from electronic health records (EHR). This study conducted training and fine-tuning of ML algorithms for the real-world prediction of asthma exacerbations in patients with physician-diagnosed asthma.MethodsAdults with ≥ 2 ICD9/10 asthma codes within 1 year and at least 30 days apart were identified from the Optum Panther EHR database between 2016 and 2023. An emergency department (ED), urgent care, or inpatient visit for asthma, while on systemic administration of corticosteroids, was considered an exacerbation. To predict factors associated with exacerbations in a 6-month study period, clinical information from patients was retrieved in the preceding 6-month baseline period. Clinical information included demographics, lab results, diagnoses, medications, immunizations, and allergies. Three models built using Extreme Gradient Boosting (XGBoost), Long Short-Term Memory (LSTM), and Transformers algorithms were trained and tested on independent datasets. Predictions were explained using the SHAP (SHapley Additive exPlanations) library.ResultsOf 1,331,934 patients with asthma, 16,279 (1.2%) experienced ≥ 1 exacerbation. XGBoost was the best predictive algorithm (area under the curve [AUC] = 0.964). Factors associated with exacerbations included a prior history of exacerbation, prednisone usage, high-dose albuterol usage, and elevated troponin I. Reduced probability of exacerbations was associated with receiving inhaled albuterol, vitamins, aspirin, statins, furosemide, and influenza vaccination.ConclusionThis ML-based study on asthma in the real world confirmed previously known features associated with increased exacerbation risk for asthma, while uncovering not entirely understood features associated with reduced risk of asthma exacerbations. These findings are hypothesis-generating and should contribute to ongoing discussion of the strengths and limitations of ML and other supervised learning models in patient risk stratification.
Project description:Recently, there has been a growing interest in the development of pharmacological interventions targeting ageing, as well as in the use of machine learning for analysing ageing-related data. In this work, we use machine learning methods to analyse data from DrugAge, a database of chemical compounds (including drugs) modulating lifespan in model organisms. To this end, we created four types of datasets for predicting whether or not a compound extends the lifespan of C. elegans (the most frequent model organism in DrugAge), using four different types of predictive biological features, based on: compound-protein interactions, interactions between compounds and proteins encoded by ageing-related genes, and two types of terms annotated for proteins targeted by the compounds, namely Gene Ontology (GO) terms and physiology terms from the WormBase's Phenotype Ontology. To analyse these datasets, we used a combination of feature selection methods in a data pre-processing phase and the well-established random forest algorithm for learning predictive models from the selected features. In addition, we interpreted the most important features in the two best models in light of the biology of ageing. One noteworthy feature was the GO term "Glutathione metabolic process", which plays an important role in cellular redox homeostasis and detoxification. We also predicted the most promising novel compounds for extending lifespan from a list of previously unlabelled compounds. These include nitroprusside, which is used as an antihypertensive medication. Overall, our work opens avenues for future work in employing machine learning to predict novel life-extending compounds.
Project description:Refractive index (RI) is one of the most important optical properties of materials. Due to the high importance of this physical parameter, there has always been a demand to find a method that provides the most optimal estimation. In this research, we utilize experimentally measured RI values of 272 inorganic compounds to build a machine learning model capable of predicting the RI of materials with low computational cost. Considering the significant relationship between the band gap and RI, we select this parameter as a predictor. In addition to the band gap, the atomic properties related to the building elements of the compounds form our data set in this work. To find the most optimal model and set of suitable predictors, we examine our data in four categories with 1, 5, 10, and 21 features. In addition, we compare the predicted RIs of 6 different independent regression methods, namely, ordinary least squares (OLSR), Gaussian process (GPR), support vector (SVR), random forest (RFR), gradient boosted trees (GBTR), and extremely randomized trees regression(ERTR). We notice that ERTR predicts RI with the highest accuracy compared to other regression methods. The prediction strength of our model excels in empirical relations and provides accurate results for a wide range of RIs. Thus, we demonstrate the high potential of machine learning methods for evaluating the RI, especially when it comes to providing an estimation of a desired physical quantity.