Project description:Gene expression profiles were generated from 199 primary breast cancer patients. Samples 1-176 were used in another study, GEO Series GSE22820, and form the training data set in this study. Sample numbers 200-222 form a validation set. This data is used to model a machine learning classifier for Estrogen Receptor Status. RNA was isolated from 199 primary breast cancer patients. A machine learning classifier was built to predict ER status using only three gene features.
Project description:RNA-sequencing (RNA-seq) is widely used for analysis of alternative splicing, but in practice, has inherent biases which hinder its ability to detect and quantify splicing events. To address this, we present a targeted RNA-seq method that specifically enriches for splicing-informative junction-spanning reads. Local Splicing Variation sequencing (LSV-seq) utilizes multiplexed reverse transcription from highly scalable pools of primers anchored near splice junctions of interest. Primers are designed using Optimal Prime, a novel dedicated machine learning algorithm trained on the performance of thousands of primer sequences. LSV-seq achieves high on-target capture rates and concordance with RNA-seq, while requiring several-fold lower sequencing depth. We use LSV-seq to target events with low coverage in GTEx RNA-seq data and discover hundreds of previously hidden tissue-specific splicing events. Our results demonstrate the ability of LSV-seq to capture alternative splicing with exceptional sensitivity and highlight its potential to improve the detection of other RNA features of interest.
Project description:Leaf senescence is a tightly controlled and complex developmental process that shares many similarities across species, yet our understanding of the underlying conserved molecular mechanisms is still lacking. Here, we observed functional conservation of leaf senescence underlying pathways in A. thaliana, O. sativa, and S. lycopersicum. From machine learning-based integration of data from nearly 10 000 samples to obtain a universal regulatory network of leaf senescence, it was found that mitostasis is the cross-species central biological hub. We measure and compare changes in the transcriptome and metabolome of A. thaliana, O. sativa, and S. lycopersicum leaves under mitostress/natural senescence. In data from different species, mitostasis-related transcription factors binding site enrichment and amino acids expression changes converge on putative senescence modulators. Our study provides a cross-species, multi-omics perspective for understanding the leaf senescence conserved mechanisms.
Project description:Background - Senescence classification is an acknowledged challenge within the field, as markers are cell-type and context dependent. Currently, multiple morphological and immunofluorescence markers are required. However, emerging scRNA-seq datasets have enabled increased understanding of senescent cell heterogeneity. Methods - Here we present SenPred, a machine-learning pipeline which identifies fibroblast senescence based on single-cell transcriptomics from fibroblasts grown in 2D and 3D. Results - Using scRNA-seq of both 2D and 3D deeply senescent fibroblasts, the model predicts intra-experimental fibroblast senescence to a high degree of accuracy (>99% true positives). Applying SenPred to in vivo whole skin scRNA-seq datasets reveals that cells grown in 2D cannot accurately detect fibroblast senescence in vivo. Importantly, utilising scRNA-seq from 3D deeply senescent fibroblasts refines our ML model leading to improved detection of senescent cells in vivo. This is context specific, with the SenPred pipeline proving effective when detecting senescent human dermal fibroblasts in vivo, but not senescence of lung fibroblasts or whole skin. Conclusions - We position this as a proof-of-concept study based on currently available scRNA-seq datasets, with the intention to build a holistic model to detect multiple senescent triggers using future emerging datasets. The development of SenPred has allowed for detection of an in vivo senescent fibroblast burden in human skin, which could have broader implications for the treatment of age-related morbidities.
Project description:To achieve the best outcomes, breast cancer necessitates robust strategies for early detection. However, reliable blood-based tests for identifying early-stage disease remains elusive. Here we have employed plasma metabolomics and machine learning techniques to establish a non-invasive metabolic approach for early detection of breast cancer.
Project description:We introduce MSTracer, a tool for peptide feature detection from MS1, which incorporates a machine-learning-combined scoring function based on peptide isotopic distribution and peptide intensity shape on the LC-MS map. By using Support Vector Regression (SVR), the quality of detected peptide features is remarkably improved. By utilising Neural Networks (NN), scores that indicate the quality of features are assigned for detected features as well. We use the Human HELA LC-MSMS dataset to train and test the results and compare with MaxQuant, OpenMS, and Dinosaur.
Project description:Oncogene induced senescence (OIS) is a tumour suppressive response to oncogene activation that can be transmitted to neighbouring cells through secreted factors of the senescence associated secretory phenotype (SASP). Using single-cell transcriptomics we observed two distinct endpoints, a primary marked by Ras and a secondary by Notch. We find that secondary senescence in vitro and in vivo requires Notch, rather than SASP alone as previously thought. Currently, primary and secondary senescent cells are not thought of as functionally distinct endpoints. A blunted SASP response and the induction of fibrillar collagens in secondary senescence compared to OIS point towards a functional diversification.
Project description:The early detection of tissue and organ damage associated with autoimmune diseases (AID) has been identified as key to improve long-term survival, but non-invasive biomarkers are lacking. Elevated cell-free DNA (cfDNA) levels have been observed in AID and inflammatory bowel disease (IBD), prompting interest to use cfDNA as a potential non-invasive diagnostic and prognostic biomarker. Despite these known disease-related changes in concentration, it remains impossible to identify AID and IBD patients through cfDNA analysis alone. By using unsupervised clustering on large sets of shallow whole-genome sequencing (sWGS) cfDNA data, we uncover AID- and IBD-specific genome-wide patterns in plasma cfDNA in both the obstetric and general AID and IBD populations. Supervised learning of the genome-wide patterns allows AID prediction with 50% sensitivity at 95% specificity. Importantly, the method can identify pregnant women with AID during routine non-invasive prenatal screening. Since AID pregnancies have an increased risk of severe complications, early recognition or detection of new onset AID can redirect pregnancy management and limit potential adverse events. This method opens up new avenues for screening, diagnosis and monitoring of AID and IBD.