Genomics

Dataset Information

0

Next-generation RNA sequencing to determine changes in gene expression during breast cancer progression


ABSTRACT: Purpose: The goal of the study is to compare NGS-derived transcriptome in an isogenic breast cancer progression model cell line system. By comparing the protein-coding and noncoding gene expression of normal versus tumorigenic breast cancer cell lines, we will be able to identify genes that show aberrant expression during breast cancer progression. Methods: We isolated poly A + RNA from four isogenic mammary epithelial cell lines showing various stages of breast cancer progression. The model system comprises of 4 isogenic cell lines, categorized as M1-M4. M1 represents the normal, non-tumorigenic, immortalized MCF10A cells. Transfection of MCF10A with activated T24-HRAS and selection by xenografting generated the M2 (MCF10AT1k.cl2) cell line, which is highly proliferative and gives rise to premalignant lesions with the potential for neoplastic progression. M3 (MCF10Ca1h) and M4 (MCF10CA1a.cl1) were derived from occasional carcinomas arising from xenografts of M2 cells. M3 gives predominantly well-differentiated low-grade carcinomas on xenografting, while M4 gives rise to relatively undifferentiated carcinomas and colonizes to the lung upon injection of these cells into the tail vein. We performed paired-end deep sequencing (190-260 million reads/sample) of poly A+ RNA isolated from these cells that were cultured as 3D acini in biological duplicates. Reads of the samples were trimmed for adapters and low-quality bases using Trimmomatic software before alignment with the reference genome (Human - hg19) and the annotated transcripts using STAR. The average mapping rate of all samples is 96%. Unique alignment is above 87%. There are 3.74 to 4.07% unmapped reads. The mapping statistics are calculated using Picard software. The samples have 0.59% ribosomal bases. Percent coding bases are between 67-72%. Percent UTR bases are 23-26%, and mRNA bases are between 94-96% for all the samples. Library complexity is measured in terms of unique fragments in the mapped reads using Picard’s MarkDuplicate utility. The samples have 31-52% non-duplicate reads. In addition, the gene expression quantification analysis was performed for all samples using STAR/RSEM tools. Both the normalized count and the raw count are provided as part of the data delivery. Results: Using an optimised data analysis workflow, we mapped ~190-250 million reads/sample and identified expression of 17396 protein-coding genes and 11509 long noncoding RNA genes. We initially compared gene expression between M1 and M4 cells. 4668 genes (2815 protein coding and 1853 lncRNAs) showed ~2 fold change in their expression between M1 and M4 cells in both biological repeats. 1159 out of the 1853 deregulated lncRNAs showed 2-fold upregulation in M4 cells in both repeats. On the other hand, 694 of lncRNAs displayed reduced levels in M4 compared to M1 cells. Further, we noticed that natural antisense transcripts (NATs) comprised one of the largest types of lncRNAs (504 out of 1853) that showed deregulation in M4 cells. Conclusion: Our study revealed differential expression of thousands of protein-coding and long noncoding RNAs during breast cancer progression using the isogenic cell line model system. This data set will act as a rich resource for downstream mechanistic studies to determine the role of these differentially expressed genes in breast cancer progression.

ORGANISM(S): Homo sapiens

PROVIDER: GSE120796 | GEO | 2018/11/23

REPOSITORIES: GEO

Similar Datasets

2015-01-01 | GSE34271 | GEO
2015-01-01 | GSE34270 | GEO
2016-05-28 | E-GEOD-81971 | biostudies-arrayexpress
2009-04-01 | E-GEOD-13259 | biostudies-arrayexpress
2020-02-25 | E-MTAB-8787 | biostudies-arrayexpress
2021-12-31 | GSE163346 | GEO
2016-05-28 | GSE81971 | GEO
2020-09-30 | GSE144926 | GEO
2020-09-30 | GSE144925 | GEO
2015-02-04 | E-GEOD-60660 | biostudies-arrayexpress