Project description:A continuum of macrophage polarization states is essential tissue homeostasis. We used machine learning approaches to identify universally relevant definition of macrophage polarization states and create a predictive framework for developing macrophage-targeted precision diagnostics and therapeutics. CCDC88A was identified as a key gene in the continuum state clusters that is essential for the tolerant polarization state.
Project description:The T cell receptor (TCR) determines the specificity and affinity for both foreign and self-peptides presented by MHC. It is established that self-pMHC reactivity impacts T cell function, but it has been challenging to identify TCR sequence features that predict T cell fate. To discern patterns distinguishing TCRs from naïve CD4+ T cells with low versus high self-pMHC reactivity, we used data from 42 mice to train a machine learning (ML) algorithm that predicts self-reactivity directly from TCRβ sequences. This approach revealed that n-nucleotide additions and acidic amino acids weaken selfreactivity. We tested our ML predictions of TCRβ sequence self-reactivity using retrogenic mice. Extrapolating our analyses to independent datasets, we found high predicted self-reactivity for regulatory CD4+ T cells and low predicted self-reactivity for T cells responding to chronic infection. Our analyses suggest a potential trade-off between repertoire diversity and self-reactivity intrinsic to the architecture of a TCR repertoire.
Project description:Efforts at finding potential biomarkers of tolerance after kidney transplantation have been hindered by limited sample size, as well as the complicated mechanisms underlying tolerance and the potential risk of rejection after immunosuppressant withdrawal. In this work, three different publicly available genome-wide expression data sets of peripheral blood lymphocyte (PBL) from 63 tolerant patients were used to compare 14 different machine learning models for their ability to predict spontaneous kidney graft tolerance. We found that the Best Subset Selection (BSS) regression approach was the most powerful with a sensitivity of 91.7% and a specificity of 93.8% in the test group, and a specificity of 86.1% and a sensitivity of 80% in the validation group. A feature set with five genes (HLA-DOA, TCL1A, EBF1, CD79B, and PNOC) was identified using the BSS model. EBF1 downregulation was also an independent factor predictive of graft rejection and graft loss. An AUC value of 84.4% was achieved using the two-gene signature (EBF1 and HLA-DOA) as an input to our classifier. Overall, our systematic machine learning exploration suggests novel biological targets that might affect tolerance to renal allografts, and provides clinical insights that can potentially guide patient selection for immunosuppressant withdrawal.
Project description:Gene expression profiles were generated from 199 primary breast cancer patients. Samples 1-176 were used in another study, GEO Series GSE22820, and form the training data set in this study. Sample numbers 200-222 form a validation set. This data is used to model a machine learning classifier for Estrogen Receptor Status. RNA was isolated from 199 primary breast cancer patients. A machine learning classifier was built to predict ER status using only three gene features.
Project description:We experimented how well various supervised machine learning methods such as decision tree, partial least squares discriminant analysis (PLSDA), support vector machine and random forest perform in classifying endometriosis from the control samples trained on both transcriptomics and methylomics data. The assessment was done from two different perspectives for improving classification performances: (a) implication of three different normalization techniques, and (b) implication of differential analysis using the generalized linear model (GLM). We concluded that an appropriate machine learning diagnostic pipeline for endometriosis should use TMM normalization for transcriptomics data, and quantile or voom normalization for methylomics data, GLM for feature space reduction and classification performance maximization.
Project description:We experimented how well various supervised machine learning methods such as decision tree, partial least squares discriminant analysis (PLSDA), support vector machine and random forest perform in classifying endometriosis from the control samples trained on both transcriptomics and methylomics data. The assessment was done from two different perspectives for improving classification performances: (a) implication of three different normalization techniques, and (b) implication of differential analysis using the generalized linear model (GLM). We concluded that an appropriate machine learning diagnostic pipeline for endometriosis should use TMM normalization for transcriptomics data, and quantile or voom normalization for methylomics data, GLM for feature space reduction and classification performance maximization.
Project description:Machine learning methods, particularly neural networks trained on large datasets, are transforming how scientists approach scientific discovery and experimental design. However, current state-of-the-art neural networks are limited by their uninterpretability: despite providing accurate predictions, they cannot describe how they arrived at their predictions. Here, using an ``interpretable-by-design'' approach, we present a neural network model that provides insights into RNA splicing, a fundamental process in the transfer of genomic information into functional biochemical products. Although we designed our model to emphasize interpretability, its predictive accuracy is on par with state-of-the-art models. To demonstrate the model's interpretability, we introduce a visualization that, for any given exon, allows us to trace and quantify the entire decision process from input sequence to output splicing prediction. Importantly, the model revealed novel components of the splicing logic, which we experimentally validated. This study highlights how interpretable machine learning can advance scientific discovery.
Project description:Gene expression profiles were generated from 199 primary breast cancer patients. Samples 1-176 were used in another study, GEO Series GSE22820, and form the training data set in this study. Sample numbers 200-222 form a validation set. This data is used to model a machine learning classifier for Estrogen Receptor Status.
Project description:To extract urinary proteome spectral features based on advanced mass spectrometer and machine learning algorithms, it could get more accurate reporting results for disease classification. We tried to establish a novel diagnosis model of kidney diseases by combining machine learning XGBoost algorithm with complete urinary proteomic information.