Project description:BACKGROUND:Adverse drug reactions (ADRs) are unintended and harmful reactions caused by normal uses of drugs. Predicting and preventing ADRs in the early stage of the drug development pipeline can help to enhance drug safety and reduce financial costs. METHODS:In this paper, we developed machine learning models including a deep learning framework which can simultaneously predict ADRs and identify the molecular substructures associated with those ADRs without defining the substructures a-priori. RESULTS:We evaluated the performance of our model with ten different state-of-the-art fingerprint models and found that neural fingerprints from the deep learning model outperformed all other methods in predicting ADRs. Via feature analysis on drug structures, we identified important molecular substructures that are associated with specific ADRs and assessed their associations via statistical analysis. CONCLUSIONS:The deep learning model with feature analysis, substructure identification, and statistical assessment provides a promising solution for identifying risky components within molecular structures and can potentially help to improve drug safety evaluation.
Project description:Hypothyroidism is a known adverse event associated with the use of immune checkpoint inhibitors (ICIs) in cancer treatment. This study aimed to develop an interpretable machine learning (ML) model for individualized prediction of hypothyroidism in patients treated with ICIs. The retrospective cohort of patients treated with ICIs was from the First Affiliated Hospital of Ningbo University. ML methods applied include logistic regression (LR), random forest classifier (RFC), support vector machine (SVM), and extreme gradient boosting (XGBoost). The area under the receiver-operating characteristic curve (AUC) was the main evaluation metric used. Furthermore, the Shapley additive explanation (SHAP) was utilized to interpret the outcomes of the prediction model. A total of 458 patients were included in the study, with 59 patients (12.88%) observed to have developed hypothyroidism. Among the models utilized, XGBoost exhibited the highest predictive capability (AUC = 0.833). The Delong test and calibration curve indicated that XGBoost significantly outperformed the other models in prediction. The SHAP method revealed that thyroid-stimulating hormone (TSH) was the most influential predictor variable. The developed interpretable ML model holds potential for predicting the likelihood of hypothyroidism following ICI treatment in patients. ML technology offers new possibilities for predicting ICI-induced hypothyroidism, potentially providing more precise support for personalized treatment and risk management.
Project description:Pathologists are responsible for rapidly providing a diagnosis on critical health issues. Challenging cases benefit from additional opinions of pathologist colleagues. In addition to on-site colleagues, there is an active worldwide community of pathologists on social media for complementary opinions. Such access to pathologists worldwide has the capacity to improve diagnostic accuracy and generate broader consensus on next steps in patient care. From Twitter we curate 13,626 images from 6,351 tweets from 25 pathologists from 13 countries. We supplement the Twitter data with 113,161 images from 1,074,484 PubMed articles. We develop machine learning and deep learning models to (i) accurately identify histopathology stains, (ii) discriminate between tissues, and (iii) differentiate disease states. Area Under Receiver Operating Characteristic (AUROC) is 0.805-0.996 for these tasks. We repurpose the disease classifier to search for similar disease states given an image and clinical covariates. We report precision@k = 1 = 0.7618 ± 0.0018 (chance 0.397 ± 0.004, mean ±stdev ). The classifiers find that texture and tissue are important clinico-visual features of disease. Deep features trained only on natural images (e.g., cats and dogs) substantially improved search performance, while pathology-specific deep features and cell nuclei features further improved search to a lesser extent. We implement a social media bot (@pathobot on Twitter) to use the trained classifiers to aid pathologists in obtaining real-time feedback on challenging cases. If a social media post containing pathology text and images mentions the bot, the bot generates quantitative predictions of disease state (normal/artifact/infection/injury/nontumor, preneoplastic/benign/low-grade-malignant-potential, or malignant) and lists similar cases across social media and PubMed. Our project has become a globally distributed expert system that facilitates pathological diagnosis and brings expertise to underserved regions or hospitals with less expertise in a particular disease. This is the first pan-tissue pan-disease (i.e., from infection to malignancy) method for prediction and search on social media, and the first pathology study prospectively tested in public on social media. We will share data through http://pathobotology.org . We expect our project to cultivate a more connected world of physicians and improve patient care worldwide.
Project description:The widespread adoption of deep learning to build models that capture the dynamics of neural populations is typically based on "black-box" approaches that lack an interpretable link between neural activity and function. Here, we propose to apply algorithm unrolling, a method for interpretable deep learning, to design the architecture of sparse deconvolutional neural networks and obtain a direct interpretation of network weights in relation to stimulus-driven single-neuron activity through a generative model. We characterize our method, referred to as deconvolutional unrolled neural learning (DUNL), and show its versatility by applying it to deconvolve single-trial local signals across multiple brain areas and recording modalities. To exemplify use cases of our decomposition method, we uncover multiplexed salience and reward prediction error signals from midbrain dopamine neurons in an unbiased manner, perform simultaneous event detection and characterization in somatosensory thalamus recordings, and characterize the responses of neurons in the piriform cortex. Our work leverages the advances in interpretable deep learning to gain a mechanistic understanding of neural dynamics.
Project description:Applying deep learning in population genomics is challenging because of computational issues and lack of interpretable models. Here, we propose GenNet, a novel open-source deep learning framework for predicting phenotypes from genetic variants. In this framework, interpretable and memory-efficient neural network architectures are constructed by embedding biologically knowledge from public databases, resulting in neural networks that contain only biologically plausible connections. We applied the framework to seventeen phenotypes and found well-replicated genes such as HERC2 and OCA2 for hair and eye color, and novel genes such as ZNF773 and PCNT for schizophrenia. Additionally, the framework identified ubiquitin mediated proteolysis, endocrine system and viral infectious diseases as most predictive biological pathways for schizophrenia. GenNet is a freely available, end-to-end deep learning framework that allows researchers to develop and use interpretable neural networks to obtain novel insights into the genetic architecture of complex traits and diseases.
Project description:Improving the accuracy of toxicity prediction models for liver injuries is a key element in evaluating the safety of drugs and chemicals. Mechanism-based information derived from expression (transcriptomic) data, in combination with machine-learning methods, promises to improve the accuracy and robustness of current toxicity prediction models. Deep neural networks (DNNs) have the advantage of automatically assembling the relevant features from a large number of input features. This makes them especially suitable for modeling transcriptomic data, which typically contain thousands of features. Here, we gaged gene- and pathway-level feature selection schemes using single- and multi-task DNN approaches in predicting chemically induced liver injuries (biliary hyperplasia, fibrosis, and necrosis) from whole-genome DNA microarray data. The single-task DNN models showed high predictive accuracy and endpoint specificity, with Matthews correlation coefficients for the three endpoints on 10-fold cross validation ranging from 0.56 to 0.89, with an average of 0.74 in the best feature sets. The DNN models outperformed Random Forest models in cross validation and showed better performance than Support Vector Machine models when tested in the external validation datasets. In the cross validation studies, the effect of the feature selection scheme was negligible among the studied feature sets. Further evaluation of the models on their ability to predict the injury phenotype per se for non-chemically induced injuries revealed the robust performance of the DNN models across these additional external testing datasets. Thus, the DNN models learned features specific to the injury phenotype contained in the gene expression data.
Project description:The dynamics of neuron populations commonly evolve on low-dimensional manifolds. Thus, we need methods that learn the dynamical processes over neural manifolds to infer interpretable and consistent latent representations. We introduce a representation learning method, MARBLE, which decomposes on-manifold dynamics into local flow fields and maps them into a common latent space using unsupervised geometric deep learning. In simulated nonlinear dynamical systems, recurrent neural networks and experimental single-neuron recordings from primates and rodents, we discover emergent low-dimensional latent representations that parametrize high-dimensional neural dynamics during gain modulation, decision-making and changes in the internal state. These representations are consistent across neural networks and animals, enabling the robust comparison of cognitive computations. Extensive benchmarking demonstrates state-of-the-art within- and across-animal decoding accuracy of MARBLE compared to current representation learning approaches, with minimal user input. Our results suggest that a manifold structure provides a powerful inductive bias to develop decoding algorithms and assimilate data across experiments.
Project description:BackgroundPredicting outcome of breast cancer is important for selecting appropriate treatments and prolonging the survival periods of patients. Recently, different deep learning-based methods have been carefully designed for cancer outcome prediction. However, the application of these methods is still challenged by interpretability. In this study, we proposed a novel multitask deep neural network called UISNet to predict the outcome of breast cancer. The UISNet is able to interpret the importance of features for the prediction model via an uncertainty-based integrated gradients algorithm. UISNet improved the prediction by introducing prior biological pathway knowledge and utilizing patient heterogeneity information.ResultsThe model was tested in seven public datasets of breast cancer, and showed better performance (average C-index = 0.691) than the state-of-the-art methods (average C-index = 0.650, ranged from 0.619 to 0.677). Importantly, the UISNet identified 20 genes as associated with breast cancer, among which 11 have been proven to be associated with breast cancer by previous studies, and others are novel findings of this study.ConclusionsOur proposed method is accurate and robust in predicting breast cancer outcomes, and it is an effective way to identify breast cancer-associated genes. The method codes are available at: https://github.com/chh171/UISNet .
Project description:BackgroundLong-term monitoring of Electrocardiogram (ECG) recordings is crucial to diagnose arrhythmias. Clinicians can find it challenging to diagnose arrhythmias, and this is a particular issue in more remote and underdeveloped areas. The development of digital ECG and AI methods could assist clinicians who need to diagnose arrhythmias outside of the hospital setting.MethodsWe constructed a large-scale Chinese ECG benchmark dataset using data from 272,753 patients collected from January 2017 to December 2021. The dataset contains ECG recordings from all common arrhythmias present in the Chinese population. Several experienced cardiologists from Shanghai First People's Hospital labeled the dataset. We then developed a deep learning-based multi-label interpretable diagnostic model from the ECG recordings. We utilized Accuracy, F1 score and AUC-ROC to compare the performance of our model with that of the cardiologists, as well as with six comparison models, using testing and hidden data sets.ResultsThe results show that our approach achieves an F1 score of 83.51%, an average AUC ROC score of 0.977, and 93.74% mean accuracy for 6 common arrhythmias. Results from the hidden dataset demonstrate the performance of our approach exceeds that of cardiologists. Our approach also highlights the diagnostic process.ConclusionsOur diagnosis system has superior diagnostic performance over that of clinicians. It also has the potential to help clinicians rapidly identify abnormal regions on ECG recordings, thus improving efficiency and accuracy of clinical ECG diagnosis in China. This approach could therefore potentially improve the productivity of out-of-hospital ECG diagnosis and provides a promising prospect for telemedicine.