Project description:ObjectiveNon-small cell lung cancer is a leading cause of cancer death worldwide, and histopathological evaluation plays the primary role in its diagnosis. However, the morphological patterns associated with the molecular subtypes have not been systematically studied. To bridge this gap, we developed a quantitative histopathology analytic framework to identify the types and gene expression subtypes of non-small cell lung cancer objectively.Materials and methodsWe processed whole-slide histopathology images of lung adenocarcinoma (n = 427) and lung squamous cell carcinoma patients (n = 457) in the Cancer Genome Atlas. We built convolutional neural networks to classify histopathology images, evaluated their performance by the areas under the receiver-operating characteristic curves (AUCs), and validated the results in an independent cohort (n = 125).ResultsTo establish neural networks for quantitative image analyses, we first built convolutional neural network models to identify tumor regions from adjacent dense benign tissues (AUCs > 0.935) and recapitulated expert pathologists' diagnosis (AUCs > 0.877), with the results validated in an independent cohort (AUCs = 0.726-0.864). We further demonstrated that quantitative histopathology morphology features identified the major transcriptomic subtypes of both adenocarcinoma and squamous cell carcinoma (P < .01).DiscussionOur study is the first to classify the transcriptomic subtypes of non-small cell lung cancer using fully automated machine learning methods. Our approach does not rely on prior pathology knowledge and can discover novel clinically relevant histopathology patterns objectively. The developed procedure is generalizable to other tumor types or diseases.
Project description:The current staging system for non-small cell lung cancer (NSCLC) is inadequate for predicting outcome. Risk score, a linear combination of the values for the expression of each gene multiplied by a weighting value which was estimated from univariate Cox proportional hazard regression, can be useful. The aim of this study is to analyze survival-related genes with TaqMan Low-Density Array (TLDA) and risk score to explore gene-signature in lung cancer. A total of 96 NSCLC specimens were collected and randomly assigned to a training (n = 48) or a testing cohort (n = 48). A panel of 219 survival-associated genes from published studies were used to develop a 6-gene risk score. The risk score was used to classify patients into high or low-risk signature and survival analysis was performed. Cox models were used to evaluate independent prognostic factors. A 6-gene signature including ABCC4, ADRBK2, KLHL23, PDS5A, UHRF1 and ZNF551 was identified. The risk score in both training (HR = 3.14, 95% CI: 1.14-8.67, p = 0.03) and testing cohorts (HR = 5.42, 95% CI: 1.56-18.84, p = 0.01) was the independent prognostic factor. In merged public datasets including GSE50081, GSE30219, GSE31210, GSE19188, GSE37745, GSE3141 and GSE31908, the risk score (HR = 1.50, 95% CI: 1.25-1.80, p < 0.0001) was also the independent prognostic factor. The risk score generated from expression of a small number of genes did perform well in predicting overall survival and may be useful in routine clinical practice.
Project description:BackgroundCurrent histopathological classification and TNM staging have limited accuracy in predicting survival and stratifying patients for appropriate treatment. The goal of the study is to determine whether the expression pattern of functionally important regulatory proteins can add additional values for more accurate classification and prognostication of non-small lung cancer (NSCLC).MethodsThe expression of 108 proteins and phosphoproteins in 30 paired NSCLC samples were assessed using Protein Pathway Array (PPA). The differentially expressed proteins were further confirmed using a tissue microarray (TMA) containing 94 NSCLC samples and were correlated with clinical data and survival.ResultsTwelve of 108 proteins (p-CREB(Ser133), p-ERK1/2(Thr202/Tyr204), Cyclin B1, p-PDK1(Ser241), CDK4, CDK2, HSP90, CDC2p34, β-catenin, EGFR, XIAP and PCNA) were selected to build the predictor to classify normal and tumor samples with 97% accuracy. Five proteins (CDC2p34, HSP90, XIAP, CDK4 and CREB) were confirmed to be differentially expressed between NSCLC (n=94) and benign lung tumor (n=19). Over-expression of CDK4 and HSP90 in tumors correlated with a favorable overall survival in all NSCLC patients and the over-expression of p-CREB(Ser133) and CREB in NSCLC correlated with a favorable survival in smokers and those with squamous cell carcinoma, respectively. Finally, the four proteins (CDK4, HSP90, p-CREB and CREB) were used to calculate the risk score of each individual patient with NSCLC to predict survival.ConclusionIn summary, our data demonstrated a broad disturbance of functionally important regulatory proteins in NSCLC and some of these can be selected as clinically useful biomarkers for diagnosis, classification and prognosis.
Project description:Lung cancer is the leading cause of cancer deaths in the United States. Patients with early stage lung cancer have the best prognosis with surgical removal of the tumor, but the disease is often asymptomatic until advanced disease develops, and there are no effective blood-based screening methods for early detection of lung cancer in at-risk populations. We have explored the lipid profiles of blood plasma exosomes using ultra high-resolution Fourier transform mass spectrometry (UHR-FTMS) for early detection of the prevalent non-small cell lung cancers (NSCLC). Exosomes are nanovehicles released by various cells and tumor tissues to elicit important biofunctions such as immune modulation and tumor development. Plasma exosomal lipid profiles were acquired from 39 normal and 91 NSCLC subjects (44 early stage and 47 late stage). We have applied two multivariate statistical methods, Random Forest (RF) and Least Absolute Shrinkage and Selection Operator (LASSO) to classify the data. For the RF method, the Gini importance of the assigned lipids was calculated to select 16 lipids with top importance. Using the LASSO method, 7 features were selected based on a grouped LASSO penalty. The Area Under the Receiver Operating Characteristic curve for early and late stage cancer versus normal subjects using the selected lipid features was 0.85 and 0.88 for RF and 0.79 and 0.77 for LASSO, respectively. These results show the value of RF and LASSO for metabolomics data-based biomarker development, which provide robust an independent classifiers with sparse data sets. Application of LASSO and Random Forests identifies lipid features that successfully distinguish early stage lung cancer patient from healthy individuals.
Project description:Non-small-cell lung cancer (NSCLC) demonstrates remarkable molecular diversity. With the completion of The Cancer Genome Atlas (TCGA), there is opportunity for systematic analyses of the entire TCGA NSCLC cohort, including comparisons and contrasts between different disease subsets. On the basis of multidimensional and comprehensive molecular characterization (including DNA methylation and copy, and RNA and protein expression), 1023 NSCLC cases-519 from TCGA adenocarcinoma (AD) project and 504 from TCGA squamous cell carcinoma (SQCC) project-were classified using a 'cluster-of-clusters' analytic approach. Patterns from TCGA NSCLC subsets were examined in independent external databases, including the PROSPECT (Profiling of Resistance patterns and Oncogenic Signaling Pathways in Evaluation of Cancers of the Thorax) NSCLC data set. Nine genomic subtypes of NSCLC were identified, three within SQCC and six within AD. SQCC subtypes were associated with transcriptional targets of SOX2 or p63. One predominately AD subtype (with a large proportion of SQCC) shared molecular features with neuroendocrine tumors. Two AD subtypes manifested a CpG island methylator phenotype. Three AD subtypes showed high p38 and mTOR pathway activation. AD subtypes associated with low differentiation showed relatively worse prognosis. SQCC subtypes and two of the AD subtypes expressed cancer testis antigen genes, whereas three AD subtypes expressed several immune checkpoint genes including PDL1 and PDL2, corresponding with patterns of greater immune cell infiltration. Subtype associations for several immune-related markers-including PD1, PDL1, CD3 and CD8-were confirmed in the PROSPECT cohort using immunohistochemistry. NSCLC molecular subtypes have therapeutic implications and lend support to a personalized approach to NSCLC management based on molecular characterization.
Project description:Background: The current staging system is imprecise for prognostic prediction of early-stage non-small cell lung cancer (NSCLC). This study aimed to develop a robust prognostic signature for early-stage NSCLC, allowing classification of patients with a high risk of poor outcome and specific treatment decision. Method: In the present study, a comprehensive genome-wide profiling analysis was conducted using a retrospective pool of early-stage NSCLC patient data from the previous datasets of Gene Expression Omnibus (GEO) including GSE31210, GSE37745, and GSE50081 and The Cancer Genome Atlas (TCGA). Cox proportional hazards models were implemented to determine the association between gene expression levels and overall patient survival in each dataset. The common genes among all datasets were selected as candidate prognostic genes. A risk score model was developed and validated using four independent datasets and the entire cohort. The Kaplan-Meier with log-rank test was used to assess survival difference. Results: A univariate Cox proportional hazards regression analysis for each dataset showed that a total of 2280 genes in GSE31210, 762 genes in GSE37745, 871 genes in GSE50081, and 666 genes in TCGA were identified as candidate protective genes, while overall 2131 genes in GSE31210, 913 in GSE37745, 1107 in GSE50081, and 997 in TCGA were identified as candidate risky genes. There were 8 common genes associated with overall survival, including 7 mRNA and 1 lncRNA. By using the Step-wise multivariate Cox analysis, an 8-gene prognostic signature (CDCP1, HMMR, TPX2, CIRBP, HLF, KBTBD7, SEC24B-AS1, and SH2B1) for early-stage NSCLC was developed. Patients in the high-risk group had shorter overall survival than those in the low-risk group. Multivariate regression and stratified analysis suggested that the prognostic power of the 8-gene signature was independent of other clinical factors. Furthermore, the 8-gene signature achieved AUC values of 0.726, 0.701, 0.725 and 0.650 in GSE31210, GSE37745, GSE50081 and TCGA, respectively. Moreover, the combination of the 8-gene signature and the stage resulted to a better patient classification for survival prediction and treatment decision. Conclusion: This study developed a robust gene signature with great value for prognostic prediction in early-stage NSCLC, which may contribute to patient classification and personalized treatment decisions.
Project description:The tumor microenvironment strongly influences cancer development, progression, and metastasis. The role of carcinoma-associated fibroblasts (CAFs) in these processes and their clinical impact has not been studied systematically in non-small cell lung carcinoma (NSCLC). We established primary cultures of CAFs and matched normal fibroblasts (NFs) from 15 resected NSCLC. We demonstrate that CAFs have greater ability than NFs to enhance the tumorigenicity of lung cancer cell lines. Microarray gene-expression analysis of the 15 matched CAF and NF cell lines identified 46 differentially expressed genes, encoding for proteins that are significantly enriched for extracellular proteins regulated by the TGF-? signaling pathway. We have identified a subset of 11 genes (13 probe sets) that formed a prognostic gene-expression signature, which was validated in multiple independent NSCLC microarray datasets. Functional annotation using protein-protein interaction analyses of these and published cancer stroma-associated gene-expression changes revealed prominent involvement of the focal adhesion and MAPK signaling pathways. Fourteen (30%) of the 46 genes also were differentially expressed in laser-capture-microdissected corresponding primary tumor stroma compared with the matched normal lung. Six of these 14 genes could be induced by TGF-?1 in NF. The results establish the prognostic impact of CAF-associated gene-expression changes in NSCLC patients.
Project description:Background: The tumor microenvironment (TME) is involved in the development and progression of lung carcinomas. A deeper understanding of TME landscape would offer insight into prognostic biomarkers and potential therapeutic targets investigation. To this end, we aimed to identify the TME components of lung cancer and develop a prognostic signature to predict overall survival (OS). Methods: Expression data was retrieved from The Cancer Genome Atlas (TCGA) database and differentially expressed TME-related genes were calculated between tumor and normal tissues. Then nonnegative matrix factorization (NMF) clustering was used to identify two distinct subtypes. Results: Our analysis yielded a gene panel consisting of seven TME-related genes as candidate signature set. With this panel, our model showed that the high-risk group experienced a shorter survival time. This model was further validated by an independent cohort with data from Gene Expression Omnibus (GEO) database (GSE50081 and GSE13213). Additionally, we integrated the clinical factors and risk score to construct a nomogram for predicting prognosis. Our data suggested less immune cells infiltration but more fibroblasts were found in tumor tissues derived from patients at high-risk and those patients exhibited a worse immunotherapy response. Conclusion: The signature set proposed in this work could be an effective model for estimating OS in lung cancer patients. Hopefully analysis of the TME could have the potential to provide novel diagnostic, prognostic and therapeutic opportunities.
Project description:Recurrent gene mutations and fusions in cancer patients are likely to be associated with cancer progression or recurrence by Vogelstein et al. (Science (80-) 340, 1546-1558 (2013)). In this study, we investigated gene mutations and fusions that recurrently occurred in early-stage cancer patients with stage I non-small-cell cancer (NSCLC). Targeted exome sequencing was performed to profile the variants and confirmed their fidelity at the gene and pathway levels through comparison with data for stage I lung cancer patients, which was obtained from The Cancer Genome Atlas (TCGA). Next, we identified prognostic gene mutations (ATR, ERBB3, KDR, and MUC6), fusions (GOPC-ROS1 and NTRK1-SH2D2A), and VEGF signaling pathway associated with cancer recurrence. To infer the functional implication of the recurrent variants in early-stage cancers, the extent of their selection pattern was investigated, and they were shown to be under positive selection, implying a selective advantage for cancer progression. Specifically, high selection scores were observed in the variants with significantly high risks for recurrence. Taken together, the results of this study enabled us to identify recurrent gene mutations and fusions in a stage I NSCLC cohort and to demonstrate positive selection, which had implications regarding cancer recurrence.
Project description:Background: The JBR.10 trial demonstrated significant survival benefit from adjuvant cisplatin/vinorelbine (ACT) in stage IB-II NSCLC (HR 0.69, p=0.04), but stage IB patients did not derive significant benefit (HR: 0.94, p= 0.79). We hypothesized that expression profiling could identify stage-independent subgroups of patients who might benefit from adjuvant chemotherapy. Methods: Gene expression profiling was conducted on mRNA isolated from frozen JBR.10 tumor samples (either from patients under observation [OBS], or treated with ACT). The minimum gene set that selected for the greatest separation of good and poor prognosis patient subgroups in OBS patients was identified and this gene signature was used to classify patients into high and low risk for death after surgery, and predict ACT effect. The prognostic gene signature was additionally tested on ACT patients and publicly available microarray datasets. Results: A 15-gene signature separated OBS patients into equal high and low risk subgroups with significantly different prognoses (HR 15.02, 95% CI 5.12-44.04, p=0.0001). The signature was prognostic in both stage IB and II. It was also predictive of improved survival following ACT treatment in high-risk patients (HR 0.33, 95% CI 0.17-0.63, p=0.0005), but not in low risk patients (HR 3.67, 95% CI 1.22-11.06, p=0.0133; interaction p=0.0001). The prognostic effect of the signature was validated in two independent gene expression datasets of 169 stage I-II adenocarcinoma and 106 squamous cell carcinoma patients. Conclusions: This microarray-based 15-gene prognostic expression signature is stage and histology independent and may select early stage NSCLC patients who are most likely to benefit from adjuvant chemotherapy with cisplatin/vinorelbine. Keywords: Expression profiling by microarray; prognosis prediction 90 samples