Project description:In order to test the global effects of CpG island-centered gene regulation on global gene expression profile, pA+ RNA-seq data of diverse tissues and cell lines were gathered and profiled. All available mouse poly-A positive RNA-seq data (3,818 samples) were summarized and downloaded at May, 5th, 2015. Among them, excluding single cell RNA-seq or experiments whose expression verified gene counts are small (less than 5,000 genes with RPKM 0.5 or higher), 1,524 high quality RNA-seq data were used. Raw data were downloaded from Sequence Read Archive (SRA) in National Center for Biotechnology Information (NCBI) database. FASTQ files were extracted with the SRA Toolkit version 2.5.5 and aligned using STAR 2.4.2 onto the mouse and human genome (mm9 and hg19, respectively). Gene expression was calculated as RPKM values using rpkmforgenes.py (Ramsköld et al., 2009).
Project description:We reanalyzed published RNA-seq data to study 1) the genomic landscapes near surrounding regions of transcriptional start sites with regard to the gene expression activities and 2) the gene expression change upon transcription factor (MYBL1, ATF4) depletion. Raw data were downloaded from Sequence Read Archive (SRA) in National Center for Biotechnology Information (NCBI) database. FASTQ files were extracted with the SRA Toolkit version 2.5.5 and aligned using STAR 2.4.2 onto the mouse and human genome (mm9 and hg19, respectively). Gene expression was calculated as RPKM values using rpkmforgenes.py (Ramsköld et al., 2009).
Project description:In order to elucidate the general rules for gene localization and regulation mediated by CpG islands, we reanalyzed published ChIP-seq data of CXXC domain, H3K9me3, KDM2A, SUV39H1, ATF4, MYBL1, MYOD1, SPI1, and CTCF. Raw data were downloaded from Sequence Read Archive (SRA) in National Center for Biotechnology Information (NCBI) database. FASTQ files were extracted with the SRA Toolkit version 2.5.5 and aligned using Bowtie 2.2.5 onto the mouse and human genome (mm9 and hg19, respectively). For the identification of factor binding sites, model-based analysis for ChIP-seq peak caller (MACS 1.4.2) was used with a p-value cutoff of 1e-5.
Project description:RNA-seq was performed to quantify host gene-expression changes in human embryonic lung fibroblast (HELF) cells following infection with the human cytomegalovirus (HCMV) HAN strain. HELF cells harvested at 72 hours post infection and mock-infected controls were profiled, with two biological replicates per condition (HELF_1/2 for mock; HCMV_HELF_1/2 for infected). Libraries were sequenced on an Illumina platform with paired-end 150-bp reads, and reads were aligned to the human reference genome (GRCh38) for gene-level quantification. The submission includes a cross-sample expression matrix (raw counts and FPKM) and a differential-expression results table comparing infected versus mock conditions, enabling reuse of the dataset for downstream analyses. Raw FASTQ files are deposited in SRA under the corresponding BioProject and BioSample accessions.
Project description:Purpose: To characterize transcriptional profiles of murine cytomegalovirus infected allografts after renal transplantation. Methods: RNA was isolated from murine allografts and native kidneys, with and without MCMV infection. Libraries were generated and paired end 150 base pair sequencing was performed on the HiSeq 4000 (Illumina) (Supplementary Methods). Each sample was aligned to the GRCm38.p4 assembly of the mouse reference from NCBI using version 2.6.0c of the RNA-Seq aligner STAR. Transcript features were identified from the GFF file provided with the assembly from NCBI and raw coverage counts were calculated using HTSeq. The raw RNA-Seq gene expression data was normalized and post-alignment statistical analyses were performed using DESeq2 and custom analysis scripts written in R. Comparisons of gene expression and associated statistical analysis were made between different conditions of interest using the normalized read counts. All fold change values are expressed as test condition/control condition, where values less than 1 are denoted as the negative of its inverse. Results: The QIAGEN Ingenuity Pathway Analysis (IPA) software was used for canonical pathway and differential gene expression analyses. IPA showed that, compared to MCMV infected native kidneys, transplantation of MCMV-infected kidneys led to significant changes in 5502 genes (adjusted p values <0.05), involved in 391 canonical pathways. The Th17 activation pathway showed 107 differentially expressed genes.Th1 pathway was one of the most highly upregulated pathways observed in the MCMV infected allografts. Conclusions: Transcripts for Th1/Th17 cell associated activation and signaling are differentially expressed in MCMV infected kidneys after allogeneic transplantation.
Project description:To investigate differentially expressed genes in HEK293T cells upon FKBP5 knockdown (n =2, reported in this study) and control scramble (n=2) (reported earlier Yadav et al., Cell Reports 2019, NCBI Genbank SRA accession no. PRJNA512165) using total RNA extracted from HEK293T cells and analysed by RNA-seq.
Project description:We present a proteomics dataset relying on nanoLC-MS/MS-based shotgun proteomics and using data-independent acquisitions on an Orbitrap hybrid instrument to enable comparisons of protein levels in human carotid endarterectomies with atheromatous plaques, complicated lesions, and healthy vasculature. Study approval, as well as collection and selection of study specimens were described in our previous publication (https://doi.org/https://doi.org/10.1089/ars.2020.8234). Briefly, atherosclerotic samples were obtained from patients by carotid endarterectomies, and healthy carotid arteries (H) were harvested from cadavers of suicide or fatal trauma victims without cardiovascular diseases. All samples were classified by a pathologist according to guidelines by the American Heart Association as atheromatous (A), having complicated lesions (C) and were identical with those having met specified study criteria and included in high-throughput global transcriptomic analyses by RNA-seq earlier. Specifically, inclusion criteria were as follows: samples received within 1 h after endarterectomy, no RNA and protein degradation occurred, blood clot in the artery was absent, and collection of samples was done appropriately. This is a follow up proteomics dataset that used the same samples and the same sample processing protocol but relied on data-dependent acquisition (dataset identifier PXD038922, doi:10.6019/PXD038922). For further comparison, RNA-seq data are also available in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) database under the accession number PRJNA594843.
Project description:The sequence read archive (SRA) contains over 52 terabases or 482 billion reads from Drosophila melanogaster (as of June 2018). These data are massively underused by the community and include 14,423 RNA-Seq samples, that is roughly 7 times the size of modENCODE. Currently the major challenge is finding high quality datasets that are suitable for inclusion in new studies. To help the community overcome this hurdle, we re-processed all D. melanogaster RNA-Seq SRA experiments (SRXs) using an identical workflow. This workflow uses a data driven approach to identify technical metadata (i.e., strandedness and layout) for each sample in order to optimize mapping parameters. The workflow generates QC metrics, coverage tracks based on the dm6 assembly, and calculates gene level, junction level, and intergenic counts against FlyBase r6.11. This resource will allow any researcher to visualize browser tracks for any publicly available dataset, quickly identify high quality data sets for use in their own research, and download identically processed counts tables. There is a treasure trove of underused data sitting in the SRA and this work addresses the first challenge to make data integration a common laboratory practice.
Project description:DNA microarray and RNA-seq were performed on samples from four controls and eight SSc patients for testing the performance of intrinsic subset classification in two different gene expression profiling platforms. N0901, N0903, N1002, N1003, SSc0882, SSc0916, SSc0918, S0920 have previously been deposited on NCBI SRA at PRJNA237826. The remaining four SSc RNA-seq samples will be available at PRJNAXXXXXX.