Project description:Computational integrative analysis has become a significant approach in the data-driven exploration of biological problems. Many integration methods for cancer subtyping have been proposed, but evaluating these methods has become a complicated problem due to the lack of gold standards. Moreover, questions of practical importance remain to be addressed regarding the impact of selecting appropriate data types and combinations on the performance of integrative studies. Here, we constructed three classes of benchmarking datasets of nine cancers in TCGA by considering all the eleven combinations of four multi-omics data types. Using these datasets, we conducted a comprehensive evaluation of ten representative integration methods for cancer subtyping in terms of accuracy measured by combining both clustering accuracy and clinical significance, robustness, and computational efficiency. We subsequently investigated the influence of different omics data on cancer subtyping and the effectiveness of their combinations. Refuting the widely held intuition that incorporating more types of omics data always produces better results, our analyses showed that there are situations where integrating more omics data negatively impacts the performance of integration methods. Our analyses also suggested several effective combinations for most cancers under our studies, which may be of particular interest to researchers in omics data analysis.
Project description:Accurate molecular subtypes prediction of cancer patients is significant for personalized cancer diagnosis and treatments. Large amount of multi-omics data and the advancement of data-driven methods are expected to facilitate molecular subtyping of cancer. Most existing machine learning-based methods usually classify samples according to single omics data, fail to integrate multi-omics data to learn comprehensive representations of the samples, and ignore that information transfer and aggregation among samples can better represent them and ultimately help in classification. We propose a novel framework named multi-omics graph convolutional network (M-GCN) for molecular subtyping based on robust graph convolutional networks integrating multi-omics data. We first apply the Hilbert-Schmidt independence criterion least absolute shrinkage and selection operator (HSIC Lasso) to select the molecular subtype-related transcriptomic features and then construct a sample-sample similarity graph with low noise by using these features. Next, we take the selected gene expression, single nucleotide variants (SNV), and copy number variation (CNV) data as input and learn the multi-view representations of samples. On this basis, a robust variant of graph convolutional network (GCN) model is finally developed to obtain samples' new representations by aggregating their subgraphs. Experimental results of breast and stomach cancer demonstrate that the classification performance of M-GCN is superior to other existing methods. Moreover, the identified subtype-specific biomarkers are highly consistent with current clinical understanding and promising to assist accurate diagnosis and targeted drug development.
Project description:Oncogenesis and cancer can arise as a consequence of a wide range of genomic aberrations including mutations, copy number alterations, expression changes and epigenetic modifications encompassing multiple omics layers. Integrating genomic, transcriptomic, proteomic and epigenomic datasets via multi-omics analysis provides the opportunity to derive a deeper and holistic understanding of the development and progression of cancer. There are two primary approaches to integrating multi-omics data: multi-staged (focused on identifying genes driving cancer) and meta-dimensional (focused on establishing clinically relevant tumour or sample classifications). A number of ready-to-use bioinformatics tools are available to perform both multi-staged and meta-dimensional integration of multi-omics data. In this study, we compared nine different integration tools using real and simulated cancer datasets. The performance of the multi-staged integration tools were assessed at the gene, function and pathway levels, while meta-dimensional integration tools were assessed based on the sample classification performance. Additionally, we discuss the influence of factors such as data representation, sample size, signal and noise on multi-omics data integration. Our results provide current and much needed guidance regarding selection and use of the most appropriate and best performing multi-omics integration tools.
Project description:Muscle-invasive bladder cancer (MIBC) is the most common urinary system carcinoma associated with poor outcomes. It is necessary to develop a robust classification system for prognostic prediction of MIBC. Recently, increasing omics data at different levels of MIBC were produced, but few integration methods were used to classify MIBC that reflects the patient's prognosis. In this study, we constructed an autoencoder based deep learning framework to integrate multi-omics data of MIBC and clustered samples into two different subgroups with significant overall survival difference (P = 8.11 × 10-5). As an independent prognostic factor relative to clinical information, these two subtypes have some significant genomic differences. Remarkably, the subtype of poor prognosis had significant higher frequency of chromosome 3p deletion. Immune decomposition analysis results showed that these two MIBC subtypes had different immune components including macrophages M1, resting NK cells, regulatory T cells, plasma cells, and naïve B cells. Hallmark gene set enrichment analysis was performed to investigate the functional character difference between these two MIBC subtypes, which revealed that activated IL-6/JAK/STAT3 signaling, interferon-alpha response, reactive oxygen species pathway, and unfolded protein response were significantly enriched in upregulated genes of high-risk subtype. We constructed MIBC subtyping models based on multi-omics data and single omics data, respectively, and internal and external validation datasets showed the robustness of the prediction model as well as its ability of prognosis (P < 0.05 in all datasets). Finally, through bioinformatics analysis and immunohistochemistry experiments, we found that KRT7 can be used as a biomarker reflecting MIBC risk.
Project description:BackgroundMulti-omics experimental approaches are becoming common practice in biological and medical sciences underlining the need to design new integrative techniques and applications to enable the multi-scale characterization of biological systems. The integrative analysis of heterogeneous datasets generally allows to acquire additional insights and generate novel hypotheses about a given biological system. However, it can become challenging given the often-large size of omics datasets and the diversity of existing techniques. Moreover, visualization tools for interpretation are usually non-accessible to biologists without programming skills.ResultsHere, we present MiBiOmics, a web-based and standalone application that facilitates multi-omics data visualization, exploration, integration, and analysis by providing easy access to dedicated and interactive protocols. It implements classical ordination techniques and the inference of omics-based (multilayer) networks to mine complex biological systems, and identify robust biomarkers linked to specific contextual parameters or biological states.ConclusionsMiBiOmics provides easy-access to exploratory ordination techniques and to a network-based approach for integrative multi-omics analyses through an intuitive and interactive interface. MiBiOmics is currently available as a Shiny app at https://shiny-bird.univ-nantes.fr/app/Mibiomics and as a standalone application at https://gitlab.univ-nantes.fr/combi-ls2n/mibiomics .
Project description:There is a growing number of multi-domain genomic datasets for human tumors. Multi-domain data are usually interpreted after separately analyzing single-domain data and integrating the results post hoc. Data fusion techniques allow for the real integration of multi-domain data to ideally improve the tumor classification results for the prognosis and prediction of response to therapy. We have previously described the joint singular value decomposition (jSVD) technique as a means of data fusion. Here, we report on the development of these methods in open source code based on R and Python and on the application of these data fusion methods. The Cancer Genome Atlas (TCGA) Skin Cutaneous Melanoma (SKCM) dataset was used as a benchmark to evaluate the potential of the data fusion approaches to improve molecular classification of cancers in a clinically relevant manner. Our data show that the data fusion approach does not generate classification results superior to those obtained using single-domain data. Data from different domains are not entirely independent from each other, and molecular classes are characterized by features that penetrate different domains. Data fusion techniques might be better suited for response prediction, where they could contribute to the identification of predictive features in a domain-independent manner to be used as biomarkers.
Project description:We present an innovative strategy for integrating whole-genome-wide multi-omics data, which facilitates adaptive amalgamation by leveraging hidden layer features derived from high-dimensional omics data through a multi-task encoder. Empirical evaluations on eight benchmark cancer datasets substantiated that our proposed framework outstripped the comparative algorithms in cancer subtyping, delivering superior subtyping outcomes. Building upon these subtyping results, we establish a robust pipeline for identifying whole-genome-wide biomarkers, unearthing 195 significant biomarkers. Furthermore, we conduct an exhaustive analysis to assess the importance of each omic and non-coding region features at the whole-genome-wide level during cancer subtyping. Our investigation shows that both omics and non-coding region features substantially impact cancer development and survival prognosis. This study emphasizes the potential and practical implications of integrating genome-wide data in cancer research, demonstrating the potency of comprehensive genomic characterization. Additionally, our findings offer insightful perspectives for multi-omics analysis employing deep learning methodologies.
Project description:MOTIVATION:Cancer subtypes were usually defined based on molecular characterization of single omic data. Increasingly, measurements of multiple omic profiles for the same cohort are available. Defining cancer subtypes using multi-omic data may improve our understanding of cancer, and suggest more precise treatment for patients. RESULTS:We present NEMO (NEighborhood based Multi-Omics clustering), a novel algorithm for multi-omics clustering. Importantly, NEMO can be applied to partial datasets in which some patients have data for only a subset of the omics, without performing data imputation. In extensive testing on ten cancer datasets spanning 3168 patients, NEMO achieved results comparable to the best of nine state-of-the-art multi-omics clustering algorithms on full data and showed an improvement on partial data. On some of the partial data tests, PVC, a multi-view algorithm, performed better, but it is limited to two omics and to positive partial data. Finally, we demonstrate the advantage of NEMO in detailed analysis of partial data of AML patients. NEMO is fast and much simpler than existing multi-omics clustering algorithms, and avoids iterative optimization. AVAILABILITY AND IMPLEMENTATION:Code for NEMO and for reproducing all NEMO results in this paper is in github: https://github.com/Shamir-Lab/NEMO. SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.
Project description:Classification of ovarian cancer by morphologic features has a limited effect on serous ovarian cancer (SOC) treatment and prognosis. Here, we proposed a new system for SOC subtyping based on the molecular categories from the Cancer Genome Atlas project. We analyzed the DNA methylation, protein, microRNA, and gene expression of 1203 samples from 599 serous ovarian cancer patients. These samples were divided into nine subtypes based on RNA-seq data, and each subtype was found to be associated with the activation and/or suppression of the following four biological processes: immunoactivity, hormone metabolic, mesenchymal development and the MAPK signaling pathway. We also identified four DNA methylation, two protein expression, six microRNA sequencing and four pathway subtypes. By integrating the subtyping results across different omics platforms, we found that most RNA-seq subtypes overlapped with one or two subtypes from other omics data. Our study sheds light on the molecular mechanisms of SOC and provides a new perspective for the more accurate stratification of its subtypes.
Project description:As a highly heterogeneous and complex disease, the identification of cancer's molecular subtypes is crucial for accurate diagnosis and personalized treatment. The integration of multi-omics data enables a comprehensive interpretation of the molecular characteristics of cancer at various biological levels. In recent years, an increasing number of multi-omics clustering algorithms for cancer molecular subtyping have been proposed. However, the absence of a definitive gold standard makes it challenging to evaluate and compare these methods effectively. In this study, we developed a general framework for the comprehensive evaluation of multi-omics clustering algorithms and introduced an innovative metric, the accuracy-weighted average index, which simultaneously considers both clustering performance and clinical relevance. Using this framework, we performed a thorough evaluation and comparison of 11 state-of-the-art multi-omics clustering algorithms, including deep learning-based methods. By integrating the accuracy-weighted average index with computational efficiency, our analysis reveals that PIntMF demonstrates the best overall performance, making it a promising tool for molecular subtyping across a wide range of cancers.