Project description:Abstract Somatic mutations can disrupt splicing regulatory elements and have dramatic effects on cancer genes, yet the functional consequences of mutations located in extended splice regions is difficult to predict. Here, we use a deep neural network (SpliceAI) to characterize the landscape of splice-altering mutations in cancer. In our in-house series of 401 liver cancers, SpliceAI uncovers 1244 cryptic splice mutations, located outside essential splice sites, that validate at a high rate (66%) in matched RNA-seq data. We then extend the analysis to a large pan-cancer cohort of 17 714 tumors, revealing >100 000 cryptic splice mutations. Taking into account these mutations increases the power of driver gene discovery, revealing 126 new candidate driver genes. It also reveals new driver mutations in known cancer genes, doubling the frequency of splice alterations in tumor suppressor genes. Mutational signature analysis suggests mutational processes that could give rise preferentially to splice mutations in each cancer type, with an enrichment of signatures related to clock-like processes and DNA repair deficiency. Altogether, this work sheds light on the causes and impact of cryptic splice mutations in cancer, and highlights the power of deep learning approaches to better annotate the functional consequences of mutations in oncology.
Project description:Background and aim: Competing endogenous RNA (ceRNA) is believed to play vital roles in tumorigenesis. The goal of this study was to screen prognostic biomarkers in lung adenocarcinoma (LUAD). Methods: Common differentially expressed genes (DEGs) were collected from Gene Expression Omnibus (GEO) databases and The Cancer Genome Atlas databases (TCGA) using GEO2R and "limma" package in R, respectively. Overlapping DEGs were conducted using enrichment of functions and protein-protein interaction (PPI) network to discover significant candidate genes. By using a comprehensive analysis, we constructed an mRNA mediated ceRNA network. Survival rates were used Kaplan-Meier analysis. Statistical analysis was used to further identify the prognosis of studied genes. Results: Integrated analysis of GSE32863 and TCGA databases, a total of 886 overlapping DEGs, including 279 up-regulated and 607 down-regulated genes were identified. Considering the highest term of candidate genes in PPI, we identified TPX2, which was enriched in cell division signaling pathway. Besides, 35 differentially expressed miRNAs (DEmiRNAs) were predicted to target TPX2 and only 7 DEmiRNAs were identified to be prognostic biomarkers in LUAD. Then, 30 differentially expressed lncRNAs (DElncRNAs) were predicted to bind these 7 DEmiRNAs. Finally, we found that 7 DElncRNAs were correlated with the overall survival (all p <0.05). Furthermore, we identified elevated TPX2 was strongly correlated with the worse survival rate among 458 samples. Univariate and multivariate cox analysis showed TPX2 may act as an independent factor for prognosis in LUAD (p <0.05). Then pathway enrichment results suggested that TPX2 may facilitate tumorigenesis by participating in several cancer-related signaling pathways in LUAD, especially in Notch signal pathway. Conclusions: TPX2-related lncRNAs and miRNAs are related to the survival of LUAD. 7 lncRNAs, 7 miRNAs and TPX2 may serve as prognostic biomarkers in LUAD.
Project description:Lung adenocarcinoma is the most common type of primary lung cancer, but the regulatory mechanisms during carcinogenesis remain unclear. The identification of regulatory modules for lung adenocarcinoma has become one of the hotspots of bioinformatics. In this paper, multiple deep neural network (DNN) models were constructed using the expression data to identify regulatory modules for lung adenocarcinoma in biological networks. First, the mRNAs, lncRNAs and miRNAs with significant differences in the expression levels between tumor and non-tumor tissues were obtained. MRNA DNN models were established and optimized to mine candidate mRNAs that significantly contributed to the DNN models and were in the center of an interaction network. Another DNN model was then constructed and potential ceRNAs were screened out based on the contribution of each RNA to the model. Finally, three modules comprised of miRNAs and their regulated mRNAs and lncRNAs with the same regulation direction were identified as regulatory modules that regulated the initiation of lung adenocarcinoma through ceRNAs relationships. They were validated by literature and functional enrichment analysis. The effectiveness of these regulatory modules was evaluated in an independent lung adenocarcinoma dataset. Regulatory modules for lung adenocarcinoma identified in this study provided a reference for regulatory mechanisms during carcinogenesis.
Project description:Lung adenocarcinoma (LUAD) is the most common histologic subtype of lung cancer. Early-stage patients have a 30-50% probability of metastatic recurrence after surgical treatment. Here, we propose a new computational framework, Interpretable Biological Pathway Graph Neural Networks (IBPGNET), based on pathway hierarchy relationships to predict LUAD recurrence and explore the internal regulatory mechanisms of LUAD. IBPGNET can integrate different omics data efficiently and provide global interpretability. In addition, our experimental results show that IBPGNET outperforms other classification methods in 5-fold cross-validation. IBPGNET identified PSMC1 and PSMD11 as genes associated with LUAD recurrence, and their expression levels were significantly higher in LUAD cells than in normal cells. The knockdown of PSMC1 and PSMD11 in LUAD cells increased their sensitivity to afatinib and decreased cell migration, invasion and proliferation. In addition, the cells showed significantly lower EGFR expression, indicating that PSMC1 and PSMD11 may mediate therapeutic sensitivity through EGFR expression.
Project description:Although several biomarkers have been proposed to predict the response of patients with lung adenocarcinoma (LUAD) to immune checkpoint blockade (ICB) therapy, existing challenges such as test platform uniformity, cutoff value definition, and low frequencies restrict their effective clinical application. Here, we attempted to use deep neural networks (DNNs) based on somatic mutations to predict the clinical benefit of ICB to LUAD patients undergoing immunotherapy. We used DNNs to train and validate the predictive model in three cohorts. Kaplan-Meier estimates determined the overall survival (OS) and progression-free survival (PFS) between specific subgroups. Then, we performed a relevant analysis on the multiple-dimension data types including immune cell infiltration, programmed death receptor 1 ligand (PD-L1) expression, and tumor mutational burden (TMB) from cohorts of LUAD public database and immunotherapeutic patients. Two classification groups (C1 and C2) in the training and two validation sets were identified for the efficacy of ICB via the DNN algorithm. Patients in C1 showed remarkably long OS and PFS to programmed death 1 (PD-1) inhibitors. The C1 group was significantly associated with increased expression of immune cell infiltration, immune checkpoints, activated T-effectors, and interferon gamma signature. C1 group also exhibited significantly higher TMB, neoantigens, transversion, or transition than the C2 group. This work provides novel insights that classification of DNNs using somatic mutations in LUAD could serve as a potentially predictive approach in developing a strategy for anti-PD-1/PD-L1 immunotherapy.
Project description:AimLung cancer is one of the most common cancers in China and has a high mortality rate. Most patients who are diagnosed have lost the opportunity to undergo surgery. Aberrant metabolism is closely associated with tumorigenesis. We aimed to identify an effective metabolism-related prediction model for assessing prognosis based on the cancer genome atlas (TCGA) and GSE116959 databases.MethodsTCGA and GSE116959 datasets from Gene Expression Omnibus were used to obtain lung adenocarcinoma (LUAD) data. Additionally, we captured metabolism-related genes (MRGs) from the GeneCards database. First, we extracted differentially expressed genes using R to analyze the LUAD data. We then selected the same differentially expressed genes, including 168 downregulated and 77 upregulated genes. Finally, 218 differentially expressed MRGs (DEMRGs) were included to perform functional enrichment analysis and construct a protein-protein interaction network with the help of Cytoscape and Search Tool for the Retrieval of Interacting Genes database. Cytoscape was used to visualize the intensive intervals in the network. Then univariate and Least Absolute Shrinkage and Selection Operator Cox regression analyses, which assisted in identifying the overall survival (OS)-related DEMRGs and building a 10-DEMRG prognosis model, were performed. The prognostic values, tumor immunity relevance, and molecular mechanism were further investigated. A nomogram incorporating signature, age, gender, and TNM stage was established.ResultsA 10-DEMRG model was established to forecast the OS of LUAD through Least Absolute Shrinkage and Selection Operator regression analysis. This prognostic signature stratified LUAD patients into low-risk and high-risk groups. The receiver operating characteristic curve and K-M analysis indicated good performance of the DEMRGs signature at predicting OS in the TCGA dataset. Univariate and multivariate Cox regression also revealed that the DEMRGs signature was an independent prognosis factor in LUAD. We noticed that the risk score was substantially related to the clinical parameters of LUAD patients, covering age and stage. Immune analysis results showed that risk score was associated with some immune cells and immune checkpoints. Nomogram also verified the clinical value of the DEMRGs signature.ConclusionIn this study, we constructed a DEMRGs signature and established a prognostic nomogram that is robust and reliable to predict OS in LUAD. Overall, the findings could help with therapeutic customization and personalized therapies.
Project description:Deep learning (DL) is a breakthrough technology for medical imaging with high sample size requirements and interpretability issues. Using a pretrained DL model through a radiomics-guided approach, we propose a methodology for stratifying the prognosis of lung adenocarcinomas based on pretreatment CT. Our approach allows us to apply DL with smaller sample size requirements and enhanced interpretability. Baseline radiomics and DL models for the prognosis of lung adenocarcinomas were developed and tested using local (n = 617) cohort. The DL models were further tested in an external validation (n = 70) cohort. The local cohort was divided into training and test cohorts. A radiomics risk score (RRS) was developed using Cox-LASSO. Three pretrained DL networks derived from natural images were used to extract the DL features. The features were further guided using radiomics by retaining those DL features whose correlations with the radiomics features were high and Bonferroni-corrected p-values were low. The retained DL features were subject to a Cox-LASSO when constructing DL risk scores (DRS). The risk groups stratified by the RRS and DRS showed a significant difference in training, testing, and validation cohorts. The DL features were interpreted using existing radiomics features, and the texture features explained the DL features well.
Project description:BackgroundRemarkably, the anti-cancer efficacy of immunotherapy in lung adenocarcinoma (LUAD) has been demonstrated. However, predicting the beneficiaries of this expensive treatment is still a challenge.Materials and methodsA group of patients (N = 250) diagnosed with LUAD and receiving immunotherapy were retrospectively studied. They were randomly divided into a training dataset (80%) and a test dataset (20%). The training dataset was utilized to train neural network models to predict patients' objective response rate (ORR), disease control rate (DCR), responders (progression-free survival time > 6 months), and overall survival (OS) possibility, which were validated by both the training and test datasets and packaged into a tool later.ResultsIn the training dataset, the tool scored 0.9016 area under the receiver operating characteristic (AUC) curve on ORR judgment, 0.8570 on DCR, and 0.8395 on responder prediction. In the test dataset, the tool scored 0.8173 AUC on ORR, 0.8244 on DCR, and 0.8214 on responder determination. As for OS prediction, the tool scored 0.6627 AUC in the training dataset and 0.6357 in the test dataset.ConclusionsThis immunotherapy efficacy predictive tool for LUAD patients based on neural networks could predict their ORR, DCR, and responder well.
Project description:ObjectiveThe aims of this study were to screen the gene mutations that are able to predict the risk of cigarette smoking-related lung adenocarcinoma (LUAD) and to evaluate its prognostic significance.MethodsClinical data and genetic information were retrieved from the TCGA database, and the patients with LUAD were divided into three groups including never smoking, light smoking, and heavy smoking according to cigarette smoking dose. Differentially mutated genes (DMGs) of each group were analyzed. At the same time, the function of DMGs in three smoking groups was evaluated by GO function and KEGG pathway analysis. The driver genes and protein variation effect of DMGs were performed to further screen key genes. The survival characteristics of the gene expression and mutation of those genes were analyzed and plotted to visualize by the Kaplan-Meier model.ResultThe DMGs for different smoking doses were identified. The driver and deleterious mutation in the DMGs were screened and gene interaction network was constructed. The DMGs with driver mutations and deleterious mutations that were associated with the overall survival in the heavy smoking patients were considered as the candidate genes for novel markers of smoking-related LUAD. The final novel risk factor gene was identified as MYH7 and the high express of MYH7 in LUAD correlation with patients' gender, lymph node metastasis, T stage, and clinical stage.ConclusionsIn summary, it can be concluded that MYH7 is a novel biomarker for heavy smoking-related LUAD and it is significantly correlated with the prognosis of lung cancer and is related to the clinical characteristics of lung cancer.
Project description:Immunotherapy has shown significant promise as a treatment for cancer, such as lung cancer and melanoma. However, only 10%-30% of the patients respond to treatment with immune checkpoint blockers (ICBs), underscoring the need for biomarkers to predict response to immunotherapy. Here, we developed DeepGeneX, a computational framework that uses advanced deep neural network modeling and feature elimination to reduce single-cell RNA-seq data on ∼26,000 genes to six of the most important genes (CCR7, SELL, GZMB, WARS, GZMH, and LGALS1), that accurately predict response to immunotherapy. We also discovered that the high LGALS1 and WARS-expressing macrophage population represent a biomarker for ICB therapy nonresponders, suggesting that these macrophages may be a target for improving ICB response. Taken together, DeepGeneX enables biomarker discovery and provides an understanding of the molecular basis for the model's predictions.