The Many Faces of Gene Regulation in Cancer: A Computational Oncogenomics Outlook.
ABSTRACT: Cancer is a complex disease at many different levels. The molecular phenomenology of cancer is also quite rich. The mutational and genomic origins of cancer and their downstream effects on processes such as the reprogramming of the gene regulatory control and the molecular pathways depending on such control have been recognized as central to the characterization of the disease. More important though is the understanding of their causes, prognosis, and therapeutics. There is a multitude of factors associated with anomalous control of gene expression in cancer. Many of these factors are now amenable to be studied comprehensively by means of experiments based on diverse omic technologies. However, characterizing each dimension of the phenomenon individually has proven to fall short in presenting a clear picture of expression regulation as a whole. In this review article, we discuss some of the more relevant factors affecting gene expression control both, under normal conditions and in tumor settings. We describe the different omic approaches that we can use as well as the computational genomic analysis needed to track down these factors. Then we present theoretical and computational frameworks developed to integrate the amount of diverse information provided by such single-omic analyses. We contextualize this within a systems biology-based multi-omic regulation setting, aimed at better understanding the complex interplay of gene expression deregulation in cancer.
Project description:Outcomes for cancer patients vary greatly even within the same tumor type, and characterization of molecular subtypes of cancer holds important promise for improving prognosis and personalized treatment. This promise has motivated recent efforts to produce large amounts of multidimensional genomic (multi-omic) data, but current algorithms still face challenges in the integrated analysis of such data. Here we present Cancer Integration via Multikernel Learning (CIMLR), a new cancer subtyping method that integrates multi-omic data to reveal molecular subtypes of cancer. We apply CIMLR to multi-omic data from 36 cancer types and show significant improvements in both computational efficiency and ability to extract biologically meaningful cancer subtypes. The discovered subtypes exhibit significant differences in patient survival for 27 of 36 cancer types. Our analysis reveals integrated patterns of gene expression, methylation, point mutations, and copy number changes in multiple cancers and highlights patterns specifically associated with poor patient outcomes.
Project description:<h4>Background & objective</h4>Genome-wide profiles of tumors obtained using functional genomics platforms are being deposited to the public repositories at an astronomical scale, as a result of focused efforts by individual laboratories and large projects such as the Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium. Consequently, there is an urgent need for reliable tools that integrate and interpret these data in light of current knowledge and disseminate results to biomedical researchers in a user-friendly manner. We have built the canEvolve web portal to meet this need.<h4>Results</h4>canEvolve query functionalities are designed to fulfill most frequent analysis needs of cancer researchers with a view to generate novel hypotheses. canEvolve stores gene, microRNA (miRNA) and protein expression profiles, copy number alterations for multiple cancer types, and protein-protein interaction information. canEvolve allows querying of results of primary analysis, integrative analysis and network analysis of oncogenomics data. The querying for primary analysis includes differential gene and miRNA expression as well as changes in gene copy number measured with SNP microarrays. canEvolve provides results of integrative analysis of gene expression profiles with copy number alterations and with miRNA profiles as well as generalized integrative analysis using gene set enrichment analysis. The network analysis capability includes storage and visualization of gene co-expression, inferred gene regulatory networks and protein-protein interaction information. Finally, canEvolve provides correlations between gene expression and clinical outcomes in terms of univariate survival analysis.<h4>Conclusion</h4>At present canEvolve provides different types of information extracted from 90 cancer genomics studies comprising of more than 10,000 patients. The presence of multiple data types, novel integrative analysis for identifying regulators of oncogenesis, network analysis and ability to query gene lists/pathways are distinctive features of canEvolve. canEvolve will facilitate integrative and meta-analysis of oncogenomics datasets.<h4>Availability</h4>The canEvolve web portal is available at http://www.canevolve.org/.
Project description:Breast cancer (BC) is the second most common type of cancer and a major cause of death for women. Commonly, BC patients are assigned to risk groups based on the combination of prognostic and prediction factors (eg, patient age, tumor size, tumor grade, hormone receptor status, etc). Although this approach is able to identify risk groups with different prognosis, patients are highly heterogeneous in their response to treatments. To improve the prediction of BC patients, we extended clinical models (including prognostic and prediction factors with whole-omic data) to integrate omics profiles for gene expression and copy number variants (CNVs). We describe a modeling framework that is able to incorporate clinical risk factors, high-dimensional omics profiles, and interactions between omics and non-omic factors (eg, treatment). We used the proposed modeling framework and data from METABRIC (Molecular Taxonomy of Breast Cancer Consortium) to assess the impact on the accuracy of BC patient survival predictions when omics and omic-by-treatment interactions are being considered. Our analysis shows that omics and omic-by-treatment interactions explain a sizable fraction of the variance on survival time that is not explained by commonly used clinical covariates. The sizable interaction effects observed, together with the increase in prediction accuracy, suggest that whole-omic profiles could be used to improve prognosis prediction among BC patients.
Project description:Multiple omic profiles have been generated for many cancer types; however, comprehensive assessment of their prognostic values across cancers is limited. We conducted a pan-cancer prognostic assessment and presented a multi-omic kernel machine learning method to systematically quantify the prognostic values of high-throughput genomic, epigenomic, and transcriptomic profiles individually, integratively, and in combination with clinical factors for 3,382 samples across 14 cancer types. We found that the prognostic performance varied substantially across cancer types. mRNA and miRNA expression profile frequently performed the best, followed by DNA methylation profile. Germline susceptibility variants displayed low prognostic performance consistently across cancer types. The integration of omic profiles with clinical variables can lead to substantially improved prognostic performance over the use of clinical variables alone in half of cancer types examined. Moreover, we showed that the kernel machine learning method consistently outperformed existing prognostic signatures, suggesting that including a large number of omic biomarkers may provide substantial improvement in prognostic assessment. Our study provides a comprehensive portrait of omic architecture for tumor prognosis across cancers, and highlights the prognostic value of genome-wide omic biomarker aggregation, which may facilitate refined prognostic assessment in the era of precision oncology.
Project description:There is an increased need for integrative analyses of multi-omic data. We present and benchmark a novel tensorial independent component analysis (tICA) algorithm against current state-of-the-art methods. We find that tICA outperforms competing methods in identifying biological sources of data variation at a reduced computational cost. On epigenetic data, tICA can identify methylation quantitative trait loci at high sensitivity. In the cancer context, tICA identifies gene modules whose expression variation across tumours is driven by copy-number or DNA methylation changes, but whose deregulation relative to normal tissue is independent of such alterations, a result we validate by direct analysis of individual data types.
Project description:<h4>Background</h4>Identification of synthetic lethal interactions in cancer cells could offer promising new therapeutic targets. Large-scale functional genomic screening presents an opportunity to test large numbers of cancer synthetic lethal hypotheses. Methods enriching for candidate synthetic lethal targets in molecularly defined cancer cell lines can steer effective design of screening efforts. Loss of one partner of a synthetic lethal gene pair creates a dependency on the other, thus synthetic lethal gene pairs should never show simultaneous loss-of-function. We have developed a computational approach to mine large multi-omic cancer data sets and identify gene pairs with mutually exclusive loss-of-function. Since loss-of-function may not always be genetic, we look for deleterious mutations, gene deletion and/or loss of mRNA expression by bimodality defined with a novel algorithm BiSEp.<h4>Results</h4>Applying this toolkit to both tumour cell line and patient data, we achieve statistically significant enrichment for experimentally validated tumour suppressor genes and synthetic lethal gene pairings. Notably non-reliance on genetic loss reveals a number of known synthetic lethal relationships otherwise missed, resulting in marked improvement over genetic-only predictions. We go on to establish biological rationale surrounding a number of novel candidate synthetic lethal gene pairs with demonstrated dependencies in published cancer cell line shRNA screens.<h4>Conclusions</h4>This work introduces a multi-omic approach to define gene loss-of-function, and enrich for candidate synthetic lethal gene pairs in cell lines testable through functional screens. In doing so, we offer an additional resource to generate new cancer drug target and combination hypotheses. Algorithms discussed are freely available in the BiSEp CRAN package at http://cran.r-project.org/web/packages/BiSEp/index.html .
Project description:Mixed microbial communities underpin important biotechnological processes such as biological wastewater treatment (BWWT). A detailed knowledge of community structure and function relationships is essential for ultimately driving these systems towards desired outcomes, e.g., the enrichment in organisms capable of accumulating valuable resources during BWWT.A comparative integrated omic analysis including metagenomics, metatranscriptomics and metaproteomics was carried out to elucidate functional differences between seasonally distinct oleaginous mixed microbial communities (OMMCs) sampled from an anoxic BWWT tank. A computational framework for the reconstruction of community-wide metabolic networks from multi-omic data was developed. These provide an overview of the functional capabilities by incorporating gene copy, transcript and protein abundances. To identify functional genes, which have a disproportionately important role in community function, we define a high relative gene expression and a high betweenness centrality relative to node degree as gene-centric and network topological features, respectively.Genes exhibiting high expression relative to gene copy abundance include genes involved in glycerolipid metabolism, particularly triacylglycerol lipase, encoded by known lipid accumulating populations, e.g., CandidatusMicrothrix parvicella. Genes with a high relative gene expression and topologically important positions in the network include genes involved in nitrogen metabolism and fatty acid biosynthesis, encoded by Nitrosomonas spp. and Rhodococcus spp. Such genes may be regarded as 'keystone genes' as they are likely to be encoded by keystone species.The linking of key functionalities to community members through integrated omics opens up exciting possibilities for devising prediction and control strategies for microbial communities in the future.
Project description:Psychiatric disorders, including suicide, are complex disorders that are affected by many different risk factors. It has been estimated that genetic factors contribute up to 50% to suicide risk. As the candidate gene approach has not identified a gene or set of genes that can be defined as biomarkers for suicidal behaviour, much is expected from cutting edge technological approaches that can interrogate several hundred, or even millions, of biomarkers at a time. These include the ‘-omic’ approaches, such as genomics, transcriptomics, epigenomics, proteomics and metabolomics. Indeed, these have revealed new candidate biomarkers associated with suicidal behaviour. The most interesting of these have been implicated in inflammation and immune responses, which have been revealed through different study approaches, from genome-wide single nucleotide studies and the micro-RNA transcriptome, to the proteome and metabolome. However, the massive amounts of data that are generated by the ‘-omic’ technologies demand the use of powerful computational analysis, and also specifically trained personnel. In this regard, machine learning approaches are beginning to pave the way towards personalized psychiatry.
Project description:The underlying mechanisms that lead to dramatic differences between closely related pathogens are not always readily apparent. For example, the genomes of Yersinia pestis (YP) the causative agent of plague with a high mortality rate and Yersinia pseudotuberculosis (YPT) an enteric pathogen with a modest mortality rate are highly similar with some species specific differences; however the molecular causes of their distinct clinical outcomes remain poorly understood. In this study, a temporal multi-omic analysis of YP and YPT at physiologically relevant temperatures was performed to gain insights into how an acute and highly lethal bacterial pathogen, YP, differs from its less virulent progenitor, YPT. This analysis revealed higher gene and protein expression levels of conserved major virulence factors in YP relative to YPT, including the Yop virulon and the pH6 antigen. This suggests that adaptation in the regulatory architecture, in addition to the presence of unique genetic material, may contribute to the increased pathogenecity of YP relative to YPT. Additionally, global transcriptome and proteome responses of YP and YPT revealed conserved post-transcriptional control of metabolism and the translational machinery including the modulation of glutamate levels in Yersiniae. Finally, the omics data was coupled with a computational network analysis, allowing an efficient prediction of novel Yersinia virulence factors based on gene and protein expression patterns.
Project description:The c-Myc transcription factor is frequently deregulated in cancers. To search for disease diagnostic and druggable targets a transgenic lung cancer disease model was investigated. Oncogenomics identified c-Myc target genes in lung tumors. These were validated by RT-PCR, Western Blotting, EMSA assays and ChIP-seq data retrieved from public sources. Gene reporter and ChIP assays verified functional importance of c-Myc binding sites. The clinical significance was established by RT-qPCR in tumor and matched healthy control tissues, by RNA-seq data retrieved from the TCGA Consortium and by immunohistochemistry recovered from the Human Protein Atlas repository. In transgenic lung tumors 25 novel candidate genes were identified. These code for growth factors, Wnt/?-catenin and inhibitors of death receptors signaling, adhesion and cytoskeleton dynamics, invasion and angiogenesis. For 10 proteins over-expression was confirmed by IHC thus demonstrating their druggability. Moreover, c-Myc over-expression caused complete gene silencing of 12 candidate genes, including Bmp6, Fbln1 and Ptprb to influence lung morphogenesis, invasiveness and cell signaling events. Conversely, among the 75 repressed genes TNF? and TGF-? pathways as well as negative regulators of IGF1 and MAPK signaling were affected. Additionally, anti-angiogenic, anti-invasive, adhesion and extracellular matrix remodeling and growth suppressive functions were repressed. For 15 candidate genes c-Myc-dependent DNA binding and transcriptional responses in human lung cancer samples were confirmed. Finally, Kaplan-Meier survival statistics revealed clinical significance for 59 out of 100 candidate genes, thus confirming their prognostic value. In conclusion, previously unknown c-Myc target genes in lung cancer were identified to enable the development of mechanism-based therapies.