Toward Signaling-Driven Biomarkers Immune to Normal Tissue Contamination.
ABSTRACT: The goal of this study was to discover a minimally invasive pathway-specific biomarker that is immune to normal cell mRNA contamination for diagnosing head and neck squamous cell carcinoma (HNSCC). Using Elsevier's MedScan natural language processing component of the Pathway Studio software and the TRANSFAC database, we produced a curated set of genes regulated by the signaling networks driving the development of HNSCC. The network and its gene targets provided prior probabilities for gene expression, which guided our CoGAPS matrix factorization algorithm to isolate patterns related to HNSCC signaling activity from a microarray-based study. Using patterns that distinguished normal from tumor samples, we identified a reduced set of genes to analyze with Top Scoring Pair in order to produce a potential biomarker for HNSCC. Our proposed biomarker comprises targets of the transcription factor (TF) HIF1A and the FOXO family of TFs coupled with genes that show remarkable stability across all normal tissues. Based on validation with novel data from The Cancer Genome Atlas (TCGA), measured by RNAseq, and bootstrap sampling, the biomarker for normal vs. tumor has an accuracy of 0.77, a Matthews correlation coefficient of 0.54, and an area under the curve (AUC) of 0.82.
Project description:BACKGROUND:Bayesian factorization methods, including Coordinated Gene Activity in Pattern Sets (CoGAPS), are emerging as powerful analysis tools for single cell data. However, these methods have greater computational costs than their gradient-based counterparts. These costs are often prohibitive for analysis of large single-cell datasets. Many such methods can be run in parallel which enables this limitation to be overcome by running on more powerful hardware. However, the constraints imposed by the prior distributions in CoGAPS limit the applicability of parallelization methods to enhance computational efficiency for single-cell analysis. RESULTS:We developed a new software framework for parallel matrix factorization in Version 3 of the CoGAPS R/Bioconductor package to overcome the computational limitations of Bayesian matrix factorization for single cell data analysis. This parallelization framework provides asynchronous updates for sequential updating steps of the algorithm to enhance computational efficiency. These algorithmic advances were coupled with new software architecture and sparse data structures to reduce the memory overhead for single-cell data. CONCLUSIONS:Altogether our new software enhance the efficiency of the CoGAPS Bayesian matrix factorization algorithm so that it can analyze 1000 times more cells, enabling factorization of large single-cell data sets.
Project description:Coordinated Gene Activity in Pattern Sets (CoGAPS) provides an integrated package for isolating gene expression driven by a biological process, enhancing inference of biological processes from transcriptomic data. CoGAPS improves on other enrichment measurement methods by combining a Markov chain Monte Carlo (MCMC) matrix factorization algorithm (GAPS) with a threshold-independent statistic inferring activity on gene sets. The software is provided as open source C++ code built on top of JAGS software with an R interface.The R package CoGAPS and the C++ package GAPS-JAGS are provided open source under the GNU Lesser Public License (GLPL) with a users manual containing installation and operating instructions. CoGAPS is available through Bioconductor and depends on the rjags package available through CRAN to interface CoGAPS with GAPS-JAGS. URL: http://www.cancerbiostats.onc.jhmi.edu/cogaps.cfm .
Project description:LncRNAs are involved in the initiation and progression of cancer. However, the molecular mechanism and diverse clinical prognosis of MIR31HG in head and neck squamous cell carcinoma (HNSCC) are still unclear. Our previous microarray analysis showed that lncRNA MIR31HG interacted with HIF1A may play an oncogenic role in laryngeal squamous cell cancer (LSCC). To determine whether lncRNA MIR31HG served as a poor prognosis factor and targeted HIF1A to facilitate cell proliferation and tumorigenesis in human HNSCC, we found MIR31HG and HIF1A were overexpressed in LSCC, MIR31HG overexpression or co-expression of HIF1A-positive and p21-negative could serve as a poor prognostic factor for LSCC patients. We further confirmed that MIR31HG promoted cell proliferation, cell cycle progression, and inhibited cell apoptosis in vitro and in vivo. The ingenuity pathway analysis and Western blot indicated that MIR31HG regulated cell cycle progression via HIF1A and p21 in HNSCC. The current results provide evidences for the role of MIR31HG in promoting HNSCC progression and identify MIR31HG as a prognostic biomarker and putative therapeutic target in HNSCC.
Project description:Non-negative Matrix Factorization (NMF) algorithms associate gene expression with biological processes (e.g. time-course dynamics or disease subtypes). Compared with univariate associations, the relative weights of NMF solutions can obscure biomarkers. Therefore, we developed a novel patternMarkers statistic to extract genes for biological validation and enhanced visualization of NMF results. Finding novel and unbiased gene markers with patternMarkers requires whole-genome data. Therefore, we also developed Genome-Wide CoGAPS Analysis in Parallel Sets (GWCoGAPS), the first robust whole genome Bayesian NMF using the sparse, MCMC algorithm, CoGAPS. Additionally, a manual version of the GWCoGAPS algorithm contains analytic and visualization tools including patternMatcher, a Shiny web application. The decomposition in the manual pipeline can be replaced with any NMF algorithm, for further generalization of the software. Using these tools, we find granular brain-region and cell-type specific signatures with corresponding biomarkers in GTEx data, illustrating GWCoGAPS and patternMarkers ascertainment of data-driven biomarkers from whole-genome data.PatternMarkers & GWCoGAPS are in the CoGAPS Bioconductor package (3.5) under the GPL email@example.com or firstname.lastname@example.org or email@example.com.Supplementary data are available at Bioinformatics online.
Project description:Aberrant activation of signaling pathways controlled in normal epithelial cells by the epidermal growth factor receptor (EGFR) has been linked to cetuximab (a monoclonal antibody against EGFR) resistance in head and neck squamous cell carcinoma (HNSCC). To infer relevant and specific pathway activation downstream of EGFR from gene expression in HNSCC, we generated gene expression signatures using immortalized keratinocytes (HaCaT) subjected to either ligand stimulation or pharmacological inhibition of the signaling intermediaries PI-3-Kinase and MEK or transfected with EGFR, RELA/p65, or HRASVal12. The gene expression patterns that distinguished the various HaCaT variants and conditions were inferred using the Markov chain Monte Carlo (MCMC) matrix factorization algorithm Coordinated Gene Activity in Pattern Sets (CoGAPS). This approach inferred gene expression signatures with greater relevance to cell signaling pathway activation than the expression signatures inferred with standard linear models. Furthermore, the pathway signature generated using HaCaT-HRASVal12 further associated with the cetuximab treatment response in isogenic cetuximab-sensitive (UMSCC1) and -resistant (1CC8) cell lines. Our data suggest that the CoGAPS algorithm can generate gene expression signatures that are pertinent to downstream effects of receptor signaling pathway activation and potentially be useful in modeling resistance mechanisms to targeted therapies. 58 total RNA collected from HaCaT cell lines with combinations of the following experimental conditions: forced expression of EGFR, RELA/p65, and HRAS-VAL12D; grown in PBS, serum starve, and media stimulated with TNF or EGF; treated with gefitinib, LY294002, and U1026.
Project description:The integration between BioDAS ProServer and Automated Sequence Annotation Pipeline (ASAP) provides an interface for querying diverse annotation sources, chaining and linking results, and standardizing the output using the Distributed Annotation System (DAS) protocol. This interface allows pipeline plans in ASAP to be integrated into any system using HTTP and also allows the information returned by ASAP to be included in the DAS registry for use in any DAS-aware system. Three example implementations have been developed: the first accesses TRANSFAC information to automatically create gene sets for the Coordinated Gene Activity in Pattern Sets (CoGAPS) algorithm; the second integrates annotations from multiple array platforms and provides unified annotations in an R environment; and the third wraps the UniProt database for integration with the SPICE DAS client.Source code for ASAP 2.7 and the DAS 1.6 interface is available under the GNU public license. Proserver 2.20 is free software available from SourceForge. Scripts for installation and configuration on Linux are provided at our website: http://www.rits.onc.jhmi.edu/dbb/custom/A6/
Project description:Using high-throughput analyses and the TRANSFAC database, we characterized TF signatures of head and neck squamous cell carcinoma (HNSCC) subgroups by inferential analysis of target gene expression, correcting for the effects of DNA methylation and copy number. Using this discovery pipeline, we determined that human papillomavirus-related (HPV+) and HPV- HNSCC differed significantly based on the activity levels of key TFs including AP1, STATs, NF-?B and p53. Immunohistochemical analysis confirmed that HPV- HNSCC is characterized by co-activated STAT3 and NF-?B pathways and functional studies demonstrate that this phenotype can be effectively targeted with combined anti-NF-?B and anti-STAT therapies. These discoveries correlate strongly with previous findings connecting STATs, NF-?B and AP1 in HNSCC. We identified five top-scoring pair biomarkers from STATs, NF-?B and AP1 pathways that distinguish HPV+ from HPV- HNSCC based on TF activity and validated these biomarkers on TCGA and on independent validation cohorts. We conclude that a novel approach to TF pathway analysis can provide insight into therapeutic targeting of patient subgroup for heterogeneous disease such as HNSCC.
Project description:Head and Neck Squamous Cell Carcinoma (HNSCC) is the fifth most common cancer, annually affecting over half a million people worldwide. Presently, there are no accepted biomarkers for clinical detection and surveillance of HNSCC. In this work, a comprehensive genome-wide analysis of epigenetic alterations in primary HNSCC tumors was employed in conjunction with cancer-specific outlier statistics to define novel biomarker genes which are differentially methylated in HNSCC. The 37 identified biomarker candidates were top-scoring outlier genes with prominent differential methylation in tumors, but with no signal in normal tissues. These putative candidates were validated in independent HNSCC cohorts from our institution and TCGA (The Cancer Genome Atlas). Using the top candidates, ZNF14, ZNF160, and ZNF420, an assay was developed for detection of HNSCC cancer in primary tissue and saliva samples with 100% specificity when compared to normal control samples. Given the high detection specificity, the analysis of ZNF DNA methylation in combination with other DNA methylation biomarkers may be useful in the clinical setting for HNSCC detection and surveillance, particularly in high-risk patients. Several additional candidates identified through this work can be further investigated toward future development of a multi-gene panel of biomarkers for the surveillance and detection of HNSCC.
Project description:To examine the significance of the Neutrophil gelatinase-associated lipocalin (NGAL) in diagnosing head and neck squamous cell carcinoma (HNSCC) and predicting regional metastasis. We first used GEO dataset to analyze the NGAL gene expression in HNSCC. Then, we summarized the characteristics of patients retrospectively selected in clinic. Expression of NGAL protein in human HNSCC tumor, lymph node and normal samples were analyzed using immunohistochemistry. Next, we further investigated the NGAL expression in a tissue microassay to analyze the relationship between NGAL protein expression and TNM stage. Finally, we tested the NGAL protein expression in head and neck cancer cell lines. Analysis of GEO dataset concluded that NGAL gene expression in HNSCC was lower than that in normal tissue (P<0.01). There was no statistically significant difference of NGAL gene expression between T-stage and N-stage (P>0.05). NGAL protein expression in tumor was lower than that in normal tissue (P<0.01). There was no statistically significant difference of NGAL protein expression between metastasis group and non-metastasis group (P>0.05). Expression of NGAL protein was not correlated with TNM stage of HNSCC. Aggressive HNSCC cell lines have lower NGAL protein expression. Our data demonstrated that the expression of NGAL protein was correlated with tumorigenesis of HNSCC, but not with regional metastasis. It may serve as a novel biomarker for prognostic evaluation of patients with HNSCC.
Project description:BACKGROUND:Few diagnostic and prognostic biomarkers are available for head-and-neck squamous cell carcinoma (HNSCC). Long non-coding RNAs (lncRNAs) have shown promise as biomarkers in other cancer types and in some cases functionally contribute to tumor development and progression. Here, we searched for lncRNAs useful as biomarkers in HNSCC. METHODS:Public datasets were mined for lncRNA candidates. Two independent HNSCC tissue sets and a bladder cancer tissue set were analyzed by RT-qPCR. Effects of lncRNA overexpression or downregulation on cell proliferation, clonogenicity, migration and chemosensitivity were studied in HNSCC cell lines. RESULTS:Data mining revealed prominently CASC9, a lncRNA significantly overexpressed in HNSCC tumor tissues according to the TCGA RNAseq data. Overexpression was confirmed by RT-qPCR analyses of patient tissues from two independent cohorts. CASC9 expression discriminated tumors from normal tissues with even higher specificity than HOTAIR, a lncRNA previously suggested as an HNSCC biomarker. Specificity of HNSCC detection by CASC9 was further improved by combination with HOTAIR. Analysis of TCGA pan-cancer data revealed significant overexpression of CASC9 across different other entities including bladder, liver, lung and stomach cancers and especially in squamous cell carcinoma (SCC) of the lung. By RT-qPCR analysis we furthermore detected stronger CASC9 overexpression in pure SCC of the urinary bladder and mixed urothelial carcinoma with squamous differentiation than in pure urothelial carcinomas. Thus, CASC9 might represent a general diagnostic biomarker and particularly for SCCs. Unexpectedly, up- or downregulation of CASC9 expression in HNSCC cell lines with low or high CASC9 expression, respectively, did not result in significant changes of cell viability, clonogenicity, migration or chemosensitivity. CONCLUSIONS:CASC9 is a promising biomarker for HNSCC detection. While regularly overexpressed, however, this lncRNA does not seem to act as a major driver of development or progression in this tumor.