Dataset Information

Next-generation RNA sequencing to determine changes in gene expression during breast cancer progression

ABSTRACT: Purpose: The goal of the study is to compare NGS-derived transcriptome in an isogenic breast cancer progression model cell line system. By comparing the protein-coding and noncoding gene expression of normal versus tumorigenic breast cancer cell lines, we will be able to identify genes that show aberrant expression during breast cancer progression. Methods: We isolated poly A + RNA from four isogenic mammary epithelial cell lines showing various stages of breast cancer progression. The model system comprises of 4 isogenic cell lines, categorized as M1-M4. M1 represents the normal, non-tumorigenic, immortalized MCF10A cells. Transfection of MCF10A with activated T24-HRAS and selection by xenografting generated the M2 (MCF10AT1k.cl2) cell line, which is highly proliferative and gives rise to premalignant lesions with the potential for neoplastic progression. M3 (MCF10Ca1h) and M4 (MCF10CA1a.cl1) were derived from occasional carcinomas arising from xenografts of M2 cells. M3 gives predominantly well-differentiated low-grade carcinomas on xenografting, while M4 gives rise to relatively undifferentiated carcinomas and colonizes to the lung upon injection of these cells into the tail vein. We performed paired-end deep sequencing (190-260 million reads/sample) of poly A+ RNA isolated from these cells that were cultured as 3D acini in biological duplicates. Reads of the samples were trimmed for adapters and low-quality bases using Trimmomatic software before alignment with the reference genome (Human - hg19) and the annotated transcripts using STAR. The average mapping rate of all samples is 96%. Unique alignment is above 87%. There are 3.74 to 4.07% unmapped reads. The mapping statistics are calculated using Picard software. The samples have 0.59% ribosomal bases. Percent coding bases are between 67-72%. Percent UTR bases are 23-26%, and mRNA bases are between 94-96% for all the samples. Library complexity is measured in terms of unique fragments in the mapped reads using Picard’s MarkDuplicate utility. The samples have 31-52% non-duplicate reads. In addition, the gene expression quantification analysis was performed for all samples using STAR/RSEM tools. Both the normalized count and the raw count are provided as part of the data delivery. Results: Using an optimised data analysis workflow, we mapped ~190-250 million reads/sample and identified expression of 17396 protein-coding genes and 11509 long noncoding RNA genes. We initially compared gene expression between M1 and M4 cells. 4668 genes (2815 protein coding and 1853 lncRNAs) showed ~2 fold change in their expression between M1 and M4 cells in both biological repeats. 1159 out of the 1853 deregulated lncRNAs showed 2-fold upregulation in M4 cells in both repeats. On the other hand, 694 of lncRNAs displayed reduced levels in M4 compared to M1 cells. Further, we noticed that natural antisense transcripts (NATs) comprised one of the largest types of lncRNAs (504 out of 1853) that showed deregulation in M4 cells. Conclusion: Our study revealed differential expression of thousands of protein-coding and long noncoding RNAs during breast cancer progression using the isogenic cell line model system. This data set will act as a rich resource for downstream mechanistic studies to determine the role of these differentially expressed genes in breast cancer progression.

ORGANISM(S): Homo sapiens

PROVIDER: GSE120796 | GEO | 2018/11/23

REPOSITORIES: GEO

ACCESS DATA

Dataset's files

Source:

			Action	DRS
		Other

Items per page:

1 - 1 of 1

Dataset Information

Next-generation RNA sequencing to determine changes in gene expression during breast cancer progression

Dataset's files

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Smad3 ChIP-chip analysis on human breast epithelial cells of the MCF10A series
2015-01-01 | GSE34271 | GEO

In vitro gene expression profile of TGFbeta-regulated genes in MCF10A-based xenograft model of breast cancer progression
2015-01-01 | GSE34270 | GEO

Genome-wide profiling of long noncoding RNA expression patterns in chemoresistant breast cancer cells
2016-05-28 | E-GEOD-81971 | biostudies-arrayexpress

Comparisons of epithelial and mesenchymal murine breast tumor cell lines
2009-04-01 | E-GEOD-13259 | biostudies-arrayexpress

Upregulated mRNAs and lncRNAs in chemoresistant breast cancer
2020-02-25 | E-MTAB-8787 | biostudies-arrayexpress

Whole-transcriptome analysis in breast cancer with or without lymph node metastasis
2021-12-31 | GSE163346 | GEO

Genome-wide profiling of long noncoding RNA expression patterns in chemoresistant breast cancer cells
2016-05-28 | GSE81971 | GEO

MANCR lncRNA modulates cell-cycle progression and metastasis in breastcancer cells by regulating the isoform-specific expression of nuclear Rho-GEF incis
2024-09-13 | GSE257538 | GEO

Epigenomics-Based Identification of Estrogen-regulated Long Noncoding RNAs in ER+ Breast Cancer [ATAC-Seq]
2020-09-30 | GSE144925 | GEO

Epigenomics-Based Identification of Estrogen-regulated Long Noncoding RNAs in ER+ Breast Cancer [ChIP-Seq]
2020-09-30 | GSE144926 | GEO