Dataset Information

Construction of a prognostic 6-gene signature for breast cancer based on multi-omics and single-cell data

ABSTRACT:

SUBMITTER: Xing Z

PROVIDER: S-EPMC10698552 | biostudies-literature | 2023 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Similar Datasets

Project description:PurposeTo identify a gene signature for the prognosis of breast cancer using high-throughput analysis.MethodsRNASeq, single nucleotide polymorphism (SNP), copy number variation (CNV) data and clinical follow-up information were downloaded from The Cancer Genome Atlas (TCGA), and randomly divided into training set or verification set. Genes related to breast cancer prognosis and differentially expressed genes (DEGs) with CNV or SNP were screened from training set, then integrated together for feature selection of identify robust biomarkers using RandomForest. Finally, a gene-related prognostic model was established and its performance was verified in TCGA test set, Gene Expression Omnibus (GEO) validation set and breast cancer subtypes.ResultsA total of 2287 prognosis-related genes, 131 genes with amplified copy numbers, 724 gens with copy number deletions, and 280 genes with significant mutations screened from Genomic Variants were closely correlated with the development of breast cancer. A total of 120 candidate genes were obtained by integrating genes from Genomic Variants and those related to prognosis, then 6 characteristic genes (CD24, PRRG1, IQSEC3, MRGPRX, RCC2, and CASP8) were top-ranked by RandomForest for feature selection, noticeably, several of these have been previously reported to be associated with the progression of breast cancer. Cox regression analysis was performed to establish a 6-gene signature, which can stratify the risk of samples from training set, test set and external validation set, moreover, the five-year survival AUC of the model in the training set and validation set was both higher than 0.65. Thus, the 6-gene signature developed in the current study could serve as an independent prognostic factor for breast cancer patients.ConclusionThis study constructed a 6-gene signature as a novel prognostic marker for predicting the survival of breast cancer patients, providing new diagnostic/prognostic biomarkers and therapeutic targets for breast cancer patients.

Project description:N-7 methylguanine (m7G) is one of the most common RNA base modifications in post-transcriptional regulation, which participates in multiple processes such as transcription, mRNA splicing and translation during the mRNA life cycle. However, its expression and prognostic value in uterine corpus endometrial carcinoma (UCEC) have not been systematically studied. In this paper, the data such as gene expression profiles, clinical data of UCEC patients, somatic mutations and copy number variants (CNVs) are obtained from the cancer genome atlas (TCGA) and UCSC Xena. By analyzing the expression differences of m7G-related mRNA in UCEC and plotting the correlation network maps, a risk score model composed of four m7G-related mRNAs (NSUN2, NUDT3, LARP1 and NCBP3) is constructed using least absolute shrinkage and selection operator (LASSO), univariate and multivariate Cox regression in order to identify prognosis and immune response. The correlation of clinical prognosis is analyzed between the m7G-related mRNA and UCEC via Kaplan-Meier method, receiver operating characteristic (ROC) curve, principal component analysis (PCA), t-SNE, decision curve analysis (DCA) curve and nomogram etc. It is concluded that the high risk is significantly correlated with (P < 0.001) the poorer overall survival (OS) in patients with UCEC. It is one of the independent risk factors affecting the OS. Differentially expressed genes are identified by R software in the high and low risk groups. The functional analysis and pathway enrichment analysis have been performed. Single sample gene set enrichment analysis (ssGSEA), immune checkpoints, m6A-related genes, tumor mutation burden (TMB), stem cell correlation, tumor immune dysfunction and rejection (TIDE) scores and drug sensitivity are also used to study the risk model. In addition, we have obtained 3 genotypes based on consensus clustering, which are significantly related to (P < 0.001) the OS and progression-free survival (PFS). The deconvolution algorithm (CIBERSORT) is applied to calculate the proportion of 22 tumor infiltrating immune cells (TIC) in UCEC patients and the estimation algorithm (ESTIMATE) is applied to work out the number of immune and matrix components. In summary, m7G-related mRNA may become a potential biomarker for UCEC prognosis, which may promote UCEC occurrence and development by regulating cell cycles and immune cell infiltration. It is expected to become a potential therapeutic target of UECE.

Dataset Information

Construction of a prognostic 6-gene signature for breast cancer based on multi-omics and single-cell data

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets