Project description:BackgroundIntegration of multi-omics data can provide a more complex view of the biological system consisting of different interconnected molecular components, the crucial aspect for developing novel personalised therapeutic strategies for complex diseases. Various tools have been developed to integrate multi-omics data. However, an efficient multi-omics framework for regulatory network inference at the genome level that incorporates prior knowledge is still to emerge.ResultsWe present IntOMICS, an efficient integrative framework based on Bayesian networks. IntOMICS systematically analyses gene expression, DNA methylation, copy number variation and biological prior knowledge to infer regulatory networks. IntOMICS complements the missing biological prior knowledge by so-called empirical biological knowledge, estimated from the available experimental data. Regulatory networks derived from IntOMICS provide deeper insights into the complex flow of genetic information on top of the increasing accuracy trend compared to a published algorithm designed exclusively for gene expression data. The ability to capture relevant crosstalks between multi-omics modalities is verified using known associations in microsatellite stable/instable colon cancer samples. Additionally, IntOMICS performance is compared with two algorithms for multi-omics regulatory network inference that can also incorporate prior knowledge in the inference framework. IntOMICS is also applied to detect potential predictive biomarkers in microsatellite stable stage III colon cancer samples.ConclusionsWe provide IntOMICS, a framework for multi-omics data integration using a novel approach to biological knowledge discovery. IntOMICS is a powerful resource for exploratory systems biology and can provide valuable insights into the complex mechanisms of biological processes that have a vital role in personalised medicine.
Project description:MotivationThe size of available omics datasets is steadily increasing with technological advancement in recent years. While this increase in sample size can be used to improve the performance of relevant prediction tasks in healthcare, models that are optimized for large datasets usually operate as black boxes. In high-stakes scenarios, like healthcare, using a black-box model poses safety and security issues. Without an explanation about molecular factors and phenotypes that affected the prediction, healthcare providers are left with no choice but to blindly trust the models. We propose a new type of artificial neural network, named Convolutional Omics Kernel Network (COmic). By combining convolutional kernel networks with pathway-induced kernels, our method enables robust and interpretable end-to-end learning on omics datasets ranging in size from a few hundred to several hundreds of thousands of samples. Furthermore, COmic can be easily adapted to utilize multiomics data.ResultsWe evaluated the performance capabilities of COmic on six different breast cancer cohorts. Additionally, we trained COmic models on multiomics data using the METABRIC cohort. Our models performed either better or similar to competitors on both tasks. We show how the use of pathway-induced Laplacian kernels opens the black-box nature of neural networks and results in intrinsically interpretable models that eliminate the need for post hoc explanation models.Availability and implementationDatasets, labels, and pathway-induced graph Laplacians used for the single-omics tasks can be downloaded at https://ibm.ent.box.com/s/ac2ilhyn7xjj27r0xiwtom4crccuobst/folder/48027287036. While datasets and graph Laplacians for the METABRIC cohort can be downloaded from the above mentioned repository, the labels have to be downloaded from cBioPortal at https://www.cbioportal.org/study/clinicalData?id=brca\_metabric. COmic source code as well as all scripts necessary to reproduce the experiments and analysis are publicly available at https://github.com/jditz/comics.
Project description:The insulin-like growth factors (IGFs)/insulin resistance (IR) axis is the major metabolic hormonal pathway mediating the biologic mechanism of several complex human diseases, including type 2 diabetes (T2DM) and cancers. The genomewide association study (GWAS)-based approach has neither fully characterized the phenotype variation nor provided a comprehensive understanding of the regulatory biologic mechanisms. We applied systematic genomics to integrate our previous GWAS data for IGF-I and IR with multi-omics datasets, e.g., whole-blood expression quantitative loci, molecular pathways, and gene network, to capture the full range of genetic functionalities associated with IGF-I/IR and key drivers (KDs) in gene-regulatory networks. We identified both shared (e.g., T2DM, lipid metabolism, and estimated glomerular filtration signaling) and IR-specific (e.g., mechanistic target of rapamycin, phosphoinositide 3-kinases, and erb-b2 receptor tyrosine kinase 4 signaling) molecular biologic processes of IGF-I/IR axis regulation. Next, by using tissue-specific gene-gene interaction networks, we identified both well-established (e.g., IRS1 and IGF1R) and novel (e.g., AKT1, HRAS, and JAK1) KDs in the IGF-I/IR-associated subnetworks. Our results, if validated in additional genomic studies, may provide robust, comprehensive insights into the mechanisms of IGF-I/IR regulation and highlight potential novel genetic targets as preventive and therapeutic strategies for the associated diseases, e.g., T2DM and cancers.
Project description:The polyadenosine (poly(A)) tail found on the 3'-end of almost all eukaryotic mRNAs is important for mRNA stability and regulation of translation. mRNA 3'-end processing occurs co-transcriptionally and involves more than 20 proteins to specifically recognize the polyadenylation site, cleave the pre-mRNA, add a poly(A) tail, and trigger transcription termination. The polyadenylation site (PAS) defines the end of the 3'-untranslated region (3'-UTR) and, therefore, selection of the cleavage site is a critical event in regulating gene expression. Integrated structural biology approaches including biochemical reconstitution of multi-subunit complexes, cross-linking mass spectrometry, and structural analyses by X- ray crystallography and single-particle electron cryo-microscopy (cryoEM) have enabled recent progress in understanding the molecular mechanisms of the mRNA 3'-end processing machinery. Here, we describe new molecular insights into pre-mRNA recognition, cleavage and polyadenylation.
Project description:SummaryMany methods allow us to extract biological activities from omics data using information from prior knowledge resources, reducing the dimensionality for increased statistical power and better interpretability. Here, we present decoupleR, a Bioconductor and Python package containing computational methods to extract these activities within a unified framework. decoupleR allows us to flexibly run any method with a given resource, including methods that leverage mode of regulation and weights of interactions, which are not present in other frameworks. Moreover, it leverages OmniPath, a meta-resource comprising over 100 databases of prior knowledge. Using decoupleR, we evaluated the performance of methods on transcriptomic and phospho-proteomic perturbation experiments. Our findings suggest that simple linear models and the consensus score across top methods perform better than other methods at predicting perturbed regulators.Availability and implementationdecoupleR's open-source code is available in Bioconductor (https://www.bioconductor.org/packages/release/bioc/html/decoupleR.html) for R and in GitHub (https://github.com/saezlab/decoupler-py) for Python. The code to reproduce the results is in GitHub (https://github.com/saezlab/decoupleR_manuscript) and the data in Zenodo (https://zenodo.org/record/5645208).Supplementary informationSupplementary data are available at Bioinformatics Advances online.
Project description:BackgroudGlioblastoma multiforme (GBM) is among the most aggressive cancers, with current treatments limited in efficacy. A significant hurdle in the treatment of GBM is the resistance to the chemotherapeutic agent temozolomide (TMZ). The methylation status of the MGMT promoter has been implicated as a critical biomarker of response to TMZ.MethodsTo explore the mechanisms underlying resistance, we developed two TMZ-resistant GBM cell lines through a gradual increase in TMZ exposure. Transcriptome sequencing of TMZ-resistant cell lines revealed that alterations in histone post-translational modifications might be instrumental in conferring TMZ resistance. Subsequently, multi-omics analysis suggests a strong association between histone H3 lysine 9 acetylation (H3K9ac) levels and TMZ resistance.ResultsWe observed a significant correlation between the expression of H3K9ac and MGMT, particularly in the unmethylated MGMT promoter samples. More importantly, our findings suggest that H3K9ac may enhance MGMT transcription by facilitating the recruitment of the SP1 transcription factor to the MGMT transcription factor binding site. Additionally, by analyzing single-cell transcriptomics data from matched primary and recurrent GBM tumors treated with TMZ, we modeled the molecular shifts occurring upon tumor recurrence. We also noted a reduction in tumor stem cell characteristics, accompanied by an increase in H3K9ac, SP1, and MGMT levels, underscoring the potential role of H3K9ac in tumor relapse following TMZ therapy.ConclusionsThe increase in H3K9ac appears to enhance the recruitment of the transcription factor SP1 to its binding sites within the MGMT locus, consequently upregulating MGMT expression and driving TMZ resistance in GBM.
Project description:Many studies have used single-cell RNA sequencing (scRNA-seq) to infer gene regulatory networks (GRNs), which are crucial for understanding complex cellular regulation. However, the inherent noise and sparsity of scRNA-seq data present significant challenges to accurate GRN inference. This review explores one promising approach that has been proposed to address these challenges: integrating prior knowledge into the inference process to enhance the reliability of the inferred networks. We categorize common types of prior knowledge, such as experimental data and curated databases, and discuss methods for representing priors, particularly through graph structures. In addition, we classify recent GRN inference algorithms based on their ability to incorporate these priors and assess their performance in different contexts. Finally, we propose a standardized benchmarking framework to evaluate algorithms more fairly, ensuring biologically meaningful comparisons. This review provides guidance for researchers selecting GRN inference methods and offers insights for developers looking to improve current approaches and foster innovation in the field.
Project description:The identification of good targets is a critical step for the development of targeted therapies for cancer treatment. Here, we used a multi-omics approach to delineate potential targets on chromosome 20q, which frequently shows a complex pattern of DNA copy number amplification in many human cancers suggesting the presence of multiple driver genes. By comparing the amounts of individual mRNAs in cancer from 11 different human tissues with those in their corresponding normal tissues, we identified 18 genes that were robustly elevated across human cancers. Moreover, we found that higher expression levels of a majority of these genes were associated with poor prognosis in many human cancer types. Using DNA copy number and expression data for all 18 genes obtained from The Cancer Genome Atlas project, we discovered that amplification is a major mechanism driving overexpression of these 18 genes in the majority of human cancers. Our integrated analysis suggests that 18 genes on chromosome 20q might serve as novel potential molecular targets for targeted cancer therapy.
Project description:Most eukaryotic mRNA precursors (premRNAs) must undergo extensive processing, including cleavage and polyadenylation at the 3'-end. Processing at the 3'-end is controlled by sequence elements in the pre-mRNA (cis elements) as well as protein factors. Despite the seeming biochemical simplicity of the processing reactions, more than 14 proteins have been identified for the mammalian complex, and more than 20 proteins have been identified for the yeast complex. The 3'-end processing machinery also has important roles in transcription and splicing. The mammalian machinery contains several sub-complexes, including cleavage and polyadenylation specificity factor, cleavage stimulation factor, cleavage factor I, and cleavage factor II. Additional protein factors include poly(A) polymerase, poly(A)-binding protein, symplekin, and the C-terminal domain of RNA polymerase II largest subunit. The yeast machinery includes cleavage factor IA, cleavage factor IB, and cleavage and polyadenylation factor.
Project description:The replication-dependent histone mRNAs in metazoa are not polyadenylated, in contrast to the bulk of mRNA. Instead, they contain an RNA stem-loop (SL) structure close to the 3' end of the mature RNA, and this 3' end is generated by cleavage using a machinery involving the U7 snRNP and protein factors such as the stem-loop binding protein (SLBP). This machinery of 3' end processing is related to that of polyadenylation as protein components are shared between the systems. It is commonly believed that histone 3' end processing is restricted to metazoa and green algae. In contrast, polyadenylation is ubiquitous in Eukarya. However, using computational approaches, we have now identified components of histone 3' end processing in a number of protozoa. Thus, the histone mRNA stem-loop structure as well as the SLBP protein are present in many different protozoa, including Dictyostelium, alveolates, Trypanosoma, and Trichomonas. These results show that the histone 3' end processing machinery is more ancient than previously anticipated and can be traced to the root of the eukaryotic phylogenetic tree. We also identified histone mRNAs from both metazoa and protozoa that are polyadenylated but also contain the signals characteristic of histone 3' end processing. These results provide further evidence that some histone genes are regulated at the level of 3' end processing to produce either polyadenylated RNAs or RNAs with the 3' end characteristic of replication-dependent histone mRNAs.