Project description:Early identification of gastric cancer (GC) is associated with a superior survival rate compared to advanced GC. However, the poor specificity and sensitivity of traditional biomarkers suggest the importance of identifying more effective biomarkers. This study aimed to identify novel biomarkers for the prognosis of GC and construct a risk score (RS) signature based on these biomarkers, with to validation of its predictive performance. We used multi-omics data from The Cancer Genome Atlas to analyze the significance of differences in each omics data and combined the data using Fisher's method. Hub genes were subsequently subjected to univariate Cox and LASSO regression analyses and used to construct the RS signature. The RS of each patient was calculated, and the patients were divided into two subgroups according to the RS. The RS signature was validated in two independent datasets from the Gene Expression Omnibus and subsequent analyses were subsequently conducted. Five immune-related genes strongly linked to the prognosis of GC patients were obtained, namely CGB5, SLC10A2, THPO, PDGFRB, and APOD. The results revealed significant differences in overall survival between the two subgroups (p < 0.001) and indicated the high accuracy of the RS signature. When validated in two independent datasets, the results were consistent with those in the training dataset (p = 0.003 and p = 0.001). Subsequent analyses revealed that the RS signature is independent and has broad applicability among various GC subtypes. In conclusion, we used multi-omics data to obtain five immune-related genes comprising the RS signature, which can independently and effectively predict the prognosis of GC patients with high accuracy.
Project description:Breast cancer is a cancer of high complexity and heterogeneity, with differences in prognosis and survival among patients of different subtypes. Copy number variations (CNVs) within enhancers are crucial drivers of tumorigenesis by influencing expression of their targets. In this study, we performed an integrative approach to identify CNA-driven enhancers and their effect on expression of target genes in four breast cancer subtypes by integrating expression data, copy number data and H3K27ac data. We identified 672, 555, 531, 361 CNA-driven enhancer-gene pairs and 280, 189, 113 and 98 CNA-driven enhancer-lncRNA pairs in the Basal-like, Her2, LumA and LumB subtypes, respectively. We then reconstructed a CNV-driven enhancer-lncRNA-mRNA regulatory network in each subtype. Functional analysis showed CNA-driven enhancers play an important role in the progression of breast cancer subtypes by influencing P53 signaling pathway, PPAR signaling pathway, systemic lupus erythematosus and MAPK signaling pathway in the Basal-like, Her2, LumA and LumB subtypes, respectively. We characterized the potentially prognostic value of target genes of CNV-driven enhancer and lncRNA-mRNA pairs in the subtype-specific network. We identified MUM1 and AC016876.1 as prognostic biomarkers in LumA and Basal-like subtypes, respectively. Higher expression of MUM1 with an amplified enhancer exhibited poorer prognosis in LumA patients. Lower expression of AC016876.1 with a deleted enhancer exhibited poorer survival outcomes of Basal-like patients. We also identified enhancer-related lncRNA-mRNA pairs as prognostic biomarkers, including AC012313.2-MUM1 in the LumA, AC026471.4-PLK5 in the LumB, AC027307.2-OAZ1 in the Basal-like and AC022431.1-HCN2 in the Her2 subtypes. Finally, our results highlighted target genes of CNA-driven enhancers and enhancer-related lncRNA-mRNA pairs could act as prognostic markers and potential therapeutic targets in breast cancer subtypes.
Project description:PurposeTo identify a gene signature for the prognosis of breast cancer using high-throughput analysis.MethodsRNASeq, single nucleotide polymorphism (SNP), copy number variation (CNV) data and clinical follow-up information were downloaded from The Cancer Genome Atlas (TCGA), and randomly divided into training set or verification set. Genes related to breast cancer prognosis and differentially expressed genes (DEGs) with CNV or SNP were screened from training set, then integrated together for feature selection of identify robust biomarkers using RandomForest. Finally, a gene-related prognostic model was established and its performance was verified in TCGA test set, Gene Expression Omnibus (GEO) validation set and breast cancer subtypes.ResultsA total of 2287 prognosis-related genes, 131 genes with amplified copy numbers, 724 gens with copy number deletions, and 280 genes with significant mutations screened from Genomic Variants were closely correlated with the development of breast cancer. A total of 120 candidate genes were obtained by integrating genes from Genomic Variants and those related to prognosis, then 6 characteristic genes (CD24, PRRG1, IQSEC3, MRGPRX, RCC2, and CASP8) were top-ranked by RandomForest for feature selection, noticeably, several of these have been previously reported to be associated with the progression of breast cancer. Cox regression analysis was performed to establish a 6-gene signature, which can stratify the risk of samples from training set, test set and external validation set, moreover, the five-year survival AUC of the model in the training set and validation set was both higher than 0.65. Thus, the 6-gene signature developed in the current study could serve as an independent prognostic factor for breast cancer patients.ConclusionThis study constructed a 6-gene signature as a novel prognostic marker for predicting the survival of breast cancer patients, providing new diagnostic/prognostic biomarkers and therapeutic targets for breast cancer patients.
Project description:Tumor heterogeneity remains a major challenge for disease subtyping, risk stratification, and accurate clinical management. Exosome-based liquid biopsy can effectively overcome the limitations of tissue biopsy, achieving minimal invasion, multi-point dynamic monitoring, and good prognosis assessment, and has broad clinical prospects. However, there is still lacking comprehensive analysis of tumor-derived exosome (TDE)-based stratification of risk patients and prognostic assessment for breast cancer with systematic dissection of biological heterogeneity. In this study, the robust corroborative analysis for biomarker discovery (RCABD) strategy was used for the identification of exosome molecules, differential expression verification, risk prediction modeling, heterogenous dissection with multi-ome (6101 molecules), our ExoBCD database (306 molecules), and 53 independent studies (481 molecules). Our results showed that a 10-molecule exosome-derived signature (exoSIG) could successfully fulfill breast cancer risk stratification, making it a novel and accurate exosome prognostic indicator (Cox P = 9.9E-04, HR = 3.3, 95% CI 1.6-6.8). Interestingly, HLA-DQB2 and COL17A1, closely related to tumor metastasis, achieved high performance in prognosis prediction (86.35% contribution) and accuracy (Log-rank P = 0.028, AUC = 85.42%). With the combined information of patient age and tumor stage, they formed a bimolecular risk signature (Clinmin-exoSIG) and a convenient nomogram as operable tools for clinical applications. In conclusion, as an extension of ExoBCD, this study conducted systematic analyses to identify prognostic multi-molecular panel and risk signature, stratify patients and dissect biological heterogeneity based on breast cancer exosomes from a multi-omics perspective. Our results provide an important reference for in-depth exploration of the "biological heterogeneity - risk stratification - prognosis prediction".
Project description:Cancer-associated fibroblasts (CAFs) are heterogeneous constituents of the tumor microenvironment involved in the tumorigenesis, progression, and therapeutic responses of tumors. This study identified four distinct CAF subtypes of breast cancer (BRCA) using single-cell RNA sequencing (RNA-seq) data. Of these, matrix CAFs (mCAFs) were significantly associated with tumor matrix remodeling and strongly correlated with the transforming growth factor (TGF)-β signaling pathway. Consensus clustering of The Cancer Genome Atlas (TCGA) BRCA dataset using mCAF single-cell characteristic gene signatures segregated samples into high-fibrotic and low-fibrotic groups. Patients in the high-fibrotic group exhibited a significantly poor prognosis. A weighted gene co-expression network analysis and univariate Cox analysis of bulk RNA-seq data revealed 17 differential genes with prognostic values. The mCAF risk prognosis signature (mRPS) was developed using 10 machine learning algorithms. The clinical outcome predictive accuracy of the mRPS was higher than that of the conventional TNM staging system. mRPS was correlated with the infiltration level of anti-tumor effector immune cells. Based on consensus prognostic genes, BRCA samples were classified into the following two subtypes using six machine learning algorithms (accuracy > 90%): interferon (IFN)-γ-dominant (immune C2) and TGF-β-dominant (immune C6) subtypes. Patients with mRPS downregulation were associated with improved prognosis, suggesting that they can potentially benefit from immunotherapy. Thus, the mRPS model can stably predict BRCA prognosis, reflect the local immune status of the tumor, and aid clinical decisions on tumor immunotherapy.
Project description:There is an unmet clinical need to identify patients with early-stage non-small cell lung cancer (NSCLC) who are likely to develop recurrence and to predict their therapeutic responses. Our previous study developed a qRT-PCR-based seven-gene microfluidic assay to predict the recurrence risk and the clinical benefits of chemotherapy. This study showed it was feasible to apply this seven-gene panel in RNA sequencing profiles of The Cancer Genome Atlas (TCGA) NSCLC patients (n = 923) in randomly partitioned feasibility-training and validation sets (p < 0.05, Kaplan-Meier analysis). Using Boolean implication networks, DNA copy number variation-mediated transcriptional regulatory network of the seven-gene signature was identified in multiple NSCLC cohorts (n = 371). The multi-omics network genes, including PD-L1, were significantly correlated with immune infiltration and drug response to 10 commonly used drugs for treating NSCLC. ZNF71 protein expression was positively correlated with epithelial markers and was negatively correlated with mesenchymal markers in NSCLC cell lines in Western blots. PI3K was identified as a relevant pathway of proliferation networks involving ZNF71 and its isoforms formulated with CRISPR-Cas9 and RNA interference (RNAi) profiles. Based on the gene expression of the multi-omics network, repositioning drugs were identified for NSCLC treatment.
Project description:Patients with recurrent or metastatic cervical cancer are in urgent need of novel prognosis assessment or treatment approaches. In this study, a novel prognostic gene signature was discovered by utilizing cuproptosis-related angiogenesis (CuRA) gene scores obtained through weighted gene co-expression network analysis (WGCNA) of The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) datasets. To enhance its reliability, the gene signature was refined by integrating supplementary clinical variables and subjected to cross-validation. Meanwhile, the activation of the VEGF pathway was inferred from an analysis of cell-to-cell communication, based on the expression of ligands and receptors in cell transcriptomic datasets. High-CuRA patients had less infiltration of CD8 + T cells and reduced expression of most of immune checkpoint genes, which indicated greater difficulty in immunotherapy. Lower IC50 values of imatinib, pazopanib, and sorafenib in the high-CuRA group revealed the potential value of these drugs. Finally, we verified an independent prognostic gene SFT2D1 was highly expressed in cervical cancer and positively correlated with the microvascular density. Knockdown of SFT2D1 significantly inhibited ability of the proliferation, migration, and invasive in cervical cancer cells. CuRA gene signature provided valuable insights into the prediction of prognosis and immune microenvironment of cervical cancer, which could help develop new strategies for individualized precision therapy for cervical cancer patients.
Project description:Genomic and transcriptomic image data, represented by DNA and RNA fluorescence in situ hybridization (FISH), respectively, together with proteomic data, particularly that related to nuclear proteins, can help elucidate gene regulation in relation to the spatial positions of chromatins, messenger RNAs, and key proteins. However, methods for image-based multi-omics data collection and analysis are lacking. To this end, we aimed to develop the first integrative browser called iSMOD (image-based Single-cell Multi-omics Database) to collect and browse comprehensive FISH and nucleus proteomics data based on the title, abstract, and related experimental figures, which integrates multi-omics studies focusing on the key players in the cell nucleus from 20 000+ (still growing) published papers. We have also provided several exemplar demonstrations to show iSMOD's wide applications-profiling multi-omics research to reveal the molecular target for diseases; exploring the working mechanism behind biological phenomena using multi-omics interactions, and integrating the 3D multi-omics data in a virtual cell nucleus. iSMOD is a cornerstone for delineating a global view of relevant research to enable the integration of scattered data and thus provides new insights regarding the missing components of molecular pathway mechanisms and facilitates improved and efficient scientific research.
Project description:N-7 methylguanine (m7G) is one of the most common RNA base modifications in post-transcriptional regulation, which participates in multiple processes such as transcription, mRNA splicing and translation during the mRNA life cycle. However, its expression and prognostic value in uterine corpus endometrial carcinoma (UCEC) have not been systematically studied. In this paper, the data such as gene expression profiles, clinical data of UCEC patients, somatic mutations and copy number variants (CNVs) are obtained from the cancer genome atlas (TCGA) and UCSC Xena. By analyzing the expression differences of m7G-related mRNA in UCEC and plotting the correlation network maps, a risk score model composed of four m7G-related mRNAs (NSUN2, NUDT3, LARP1 and NCBP3) is constructed using least absolute shrinkage and selection operator (LASSO), univariate and multivariate Cox regression in order to identify prognosis and immune response. The correlation of clinical prognosis is analyzed between the m7G-related mRNA and UCEC via Kaplan-Meier method, receiver operating characteristic (ROC) curve, principal component analysis (PCA), t-SNE, decision curve analysis (DCA) curve and nomogram etc. It is concluded that the high risk is significantly correlated with (P < 0.001) the poorer overall survival (OS) in patients with UCEC. It is one of the independent risk factors affecting the OS. Differentially expressed genes are identified by R software in the high and low risk groups. The functional analysis and pathway enrichment analysis have been performed. Single sample gene set enrichment analysis (ssGSEA), immune checkpoints, m6A-related genes, tumor mutation burden (TMB), stem cell correlation, tumor immune dysfunction and rejection (TIDE) scores and drug sensitivity are also used to study the risk model. In addition, we have obtained 3 genotypes based on consensus clustering, which are significantly related to (P < 0.001) the OS and progression-free survival (PFS). The deconvolution algorithm (CIBERSORT) is applied to calculate the proportion of 22 tumor infiltrating immune cells (TIC) in UCEC patients and the estimation algorithm (ESTIMATE) is applied to work out the number of immune and matrix components. In summary, m7G-related mRNA may become a potential biomarker for UCEC prognosis, which may promote UCEC occurrence and development by regulating cell cycles and immune cell infiltration. It is expected to become a potential therapeutic target of UECE.
Project description:Biomarker signature identification in "omics" data is a complex challenge that requires specialized feature selection algorithms. The objective of these algorithms is to select the smallest set(s) of molecular quantities that are able to predict a given outcome (target) with maximal predictive performance. This task is even more challenging when the outcome comprises of multiple classes; for example, one may be interested in identifying the genes whose expressions allow discrimination among different types of cancer (nominal outcome) or among different stages of the same cancer, e.g. Stage 1, 2, 3 and 4 of Lung Adenocarcinoma (ordinal outcome). In this work, we consider a particular type of successful feature selection methods, named constraint-based, local causal discovery algorithms. These algorithms depend on performing a series of conditional independence tests. We extend these algorithms for the analysis of problems with continuous predictors and multi-class outcomes, by developing and equipping them with an appropriate conditional independence test procedure for both nominal and ordinal multi-class targets. The test is based on multinomial logistic regression and employs the log-likelihood ratio test for model selection. We present a comparative, experimental evaluation on seven real-world, high-dimensional, gene-expression datasets. Within the scope of our analysis the results indicate that the new conditional independence test allows the identification of smaller and better performing signatures for multi-class outcome datasets, with respect to the current alternatives for performing the independence tests.