Unknown,Transcriptomics,Genomics,Proteomics

Dataset Information

RNA-seq from ENCODE/Caltech (Mouse)

ABSTRACT: RNA-seq is a method for mapping and quantifying the transcriptome of any organism that has a genomic DNA sequence assembly (Mortazavi et al., 2008). RNA-seq is performed by reverse-transcribing an RNA sample into cDNA, followed by high-throughput DNA sequencing, which was done here on the Illumina HiSeq sequencer. The transcriptome measurements shown on these tracks were performed on polyA selected RNA (http://hgwdev.cse.ucsc.edu/cgi-bin/hgEncodeVocab?term=longPolyA&type=rnaExtract) from total cellular RNA (http://hgwdev.cse.ucsc.edu/cgi-bin/hgEncodeVocab?term=cell&type=localization). PolyA-selected RNA was fragmented by magnesium-catalyzed hydrolysis and then converted into cDNA by random priming and amplified. Paired-end 2x100 bp reads were obtained from each end of a cDNA fragment. Reads were aligned to the mm9 human reference genome using TopHat (Trapnell et al., 2009), a program specifically designed to align RNA-seq reads and discover splice junctions de novo. All sequence and alignments files are available at http://hgwdev.cse.ucsc.edu/cgi-bin/hgFileUi?db=mm9&g=wgEncodeCaltechRnaSeq. Cells were grown according to the approved ENCODE cell culture protocols (http://hgwdev.cse.ucsc.edu/ENCODE/protocols/cell/mouse). Cells were lysed in RLT buffer (Qiagen RNEasy kit), and processed on RNEasy midi columns according to the manufacturer's protocol, with the inclusion of the "on-column" DNAse digestion step to remove residual genomic DNA. A quantity of 75 µgs of total RNA was selected twice with oligo-dT beads (Dynal) according to the manufacturer's protocol to isolate mRNA from each of the preparations. A quantity of 100 ngs of mRNA was then processed according to the protocol in Mortazavi et al. (2008), and prepared for sequencing on the Illumina GAIIx or HiSeq platforms according to the protocol for the ChIP-Seq DNA genomic DNA kit (Illumina). Paired-end libraries were size-selected around 200 bp (fragment length). Libraries were sequenced with the Illumina HiSeq according to the manufacturer's recommendations. Paired-end reads of 100 bp length were obtained. Reads were mapped to the reference mouse genome (version mm9 with or without the Y chromosome, depending on the sex of the cell line, and without the random chromosomes in all cases) using TopHat (version 1.3.1) (http://tophat.cbcb.umd.edu/). TopHat was used with default settings with the exception of specifying an empirically determined mean inner-mate distance and supplying known ENSEMBL version 63 splice junctions.

ORGANISM(S): Mus musculus

SUBMITTER: UCSC ENCODE DCC

PROVIDER: E-GEOD-37909 | biostudies-arrayexpress |

REPOSITORIES: biostudies-arrayexpress

ACCESS DATA

Similar Datasets

Project description:This track displays a chromatin state segmentation for each of nine human cell types (http://hgwdev.cse.ucsc.edu/cgi-bin/hgEncodeVocab?term=GM12878,H1-hESC,HepG2,HUVEC,HMEC,HSMM,K562,NHEK,NHLF). A common set of states across the cell types were learned by computationally integrating ChIP-seq data for nine factors plus input (http://hgwdev.cse.ucsc.edu/cgi-bin/hgEncodeVocab?term=CTCF,H3K4me1,H3K4me2,H3K4me3,H3K27ac,H3K9ac,H3K36me3,H4K20me1,H3K27me3,Input) using a Hidden Markov Model (HMM). In total, fifteen states were used to segment the genome, and these states were then grouped and colored to highlight predicted functional elements. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf ChIP-seq data from the Broad Histone (http://hgwdev.cse.ucsc.edu/cgi-bin/hgTrackUi?db=hg18&g=wgEncodeBroadChipSeq) track was used to generate this track. Data for nine factors plus input (http://hgwdev.cse.ucsc.edu/cgi-bin/hgEncodeVocab?term=CTCF,H3K4me1,H3K4me2,H3K4me3,H3K27ac,H3K9ac,H3K36me3,H4K20me1,H3K27me3,Input) and nine cell types (http://hgwdev.cse.ucsc.edu/cgi-bin/hgEncodeVocab?term=GM12878,H1-hESC,HepG2,HUVEC,HMEC,HSMM,K562,NHEK,NHLF) was binarized separately at a 200 base pair resolution based on a Poisson background model. The chromatin states were learned from this binarized data using a multivariate Hidden Markov Model (HMM) that explicitly models the combinatorial patterns of observed modifications (Ernst and Kellis, 2010). To learn a common set of states across the nine cell types, first the genomes were concatenated across the cell types. For each of the nine cell types, each 200 base pair interval was then assigned to its most likely state under the model. Detailed information about the model parameters and state enrichments can be found in (Ernst et al, accepted). This is release 1 (Jun 2011) of this track, and it is based on the NCBI36/hg18 release of the Broad Histone (http://hgwdev.cse.ucsc.edu/cgi-bin/hgTrackUi?db=hg18&g=wgEncodeBroadChipSeq) track. This track has also been lifted over to GRCh37/hg19 (http://hgwdev.cse.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeBroadHmm). It is anticipated that the HMM methods will be run on the newer GRCh37/hg19 Broad Histone (http://hgwdev.cse.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeBroadHistone) data and will replace the lifted version.

Project description:This data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Piero Carninci mailto:carninci@riken.jp). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:genome@soe.ucsc.edu). This track shows 5' cap analysis gene expression (CAGE) tags and clusters in RNA extracts (http://hgwdev.cse.ucsc.edu/cgi-bin/hgEncodeVocab?type=rnaExtract) from different sub-cellular localizations (http://hgwdev.cse.ucsc.edu/cgi-bin/hgEncodeVocab?type=localization) in multiple cell lines (http://hgwdev.cse.ucsc.edu/cgi-bin/hgEncodeVocab?type=cellType). A CAGE cluster is a region of overlapping tags with an assigned value that represents the expression level. The data in this track were produced as part of the ENCODE Transcriptome Project. Release 2 has three new downloads only files per experiment (Clusters, TSS Gencode 7 and TSS HMM) and four new cell lines (A459, AG04450, BJ and SK-N-SH_RA). Release 1 on hg19 contained the original data on hg18 (http://hgwdev.cse.ucsc.edu/cgi-bin/hgTrackUi?db=hg18&g=wgEncodeRikenCage) that was remapped and indicated in this release as Generation 0 since that data had no replicates. If there is both old and new generation data available for a particular experiment, only the new generation data is displayed and the older data is available for download. The new data for this track was done with a different process and has standard replicate numbers. The replicate labeling in the genome browser view is a counter indicating the total number of replicates submitted. The producing lab has replicate numbers that correspond to their internal bio-replicate numbering. Where these two numbering systems conflict, both are listed in the long label of the specific track. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf Cells were grown according to the approved ENCODE cell culture protocols (http://hgwdev.cse.ucsc.edu/ENCODE/protocols/cell). RNA molecules longer than 200 nt were isolated from each subcellular compartment and then were fractionated into polyA+ and polyA- fractions as described in these protocols (http://hgwdev.cse.ucsc.edu/ENCODE/protocols/general/rnaExtracts.txt). The CAGE tags were sequenced from the 5' ends of cap-trapped cDNAs produced using RIKEN CAGE technology (Kodzius et al. 2006; Valen et al. 2009). To create the tag, a linker was attached to the 5' end of polyA+ or polyA- reverse-transcribed cDNAs which were selected by cap trapping (Carninci et al. 1996). The first 27 bp of the cDNA were cleaved using class II restriction enzymes. A linker was then attached to the 3' end of the cDNA. After PCR amplification, the tags were sequenced (36 bp single reads) using Illumina's Genome analyzer. Tags were mapped to the human genome (hg19) using the program delve (T. Lassmann manuscript in preparation). Delve is a new probabilistic aligner focused on giving the best possible alignment of reads to a genome rather than focusing on speed. It calculates the mapping accuracy (probability of each alignment being true or not) for each alignment. There is no set limit on the number of errors allowed and therefore the mapping rate is commonly 100%. However, for analysis it is recommended to discard alignments with low mapping qualities. Exceptions to the above protocol are the polyA- RNA samples from K562 cytosol, K562 nucleus, and prostate whole cell which were sequenced using ABI SOLiD (http://www.appliedbiosystems.com/absite/us/en/home/applications-technologies/solid-next-generation-sequencing.html) technology. These reads were mapped using Bowtie using default parameters. Clusters were defined as regions of overlapping CAGE reads. The expression level was computed as the number of reads making up the cluster, divided by the total number of reads sequenced, times 1 million.

Project description:This data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Scott Tenenbaum mailto:STenenbaum@uamail.albany.edu). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:genome@soe.ucsc.edu). The RNA binding protein (RBP) associated mRNA sequencing track (RIP-Seq) is produced as part of the Encyclopedia of DNA Elements (ENCODE) Project (http://hgwdev.cse.ucsc.edu/ENCODE/index.html). This track displays transcriptional fragments associated with RBP in cell lines (http://hgwdev.cse.ucsc.edu/cgi-bin/hgEncodeVocab?type=cellType) K562 and GM12878, using Ribonomic profiling via Illumina SBS. In eukaryotic organisms gene regulatory networks require an additional level of coordination that links transcriptional and post-transcriptional processes. Messenger RNAs have traditionally been viewed as passive molecules in the pathway from transcription to translation. However, it is now clear that RNA-binding proteins play a major role in regulating multiple mRNAs in order to facilitate gene expression patterns. These tracks show the associated mRNAs that co-precipitate with the targeted RNA-binding proteins using RIP-Seq profiling. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf RBP-mRNA complexes were purified from cells grown according to the approved ENCODE cell culture protocols (http://hgwdev.cse.ucsc.edu/ENCODE/protocols/cell). RNA samples were amplified and converted to cDNA with the Nugen (http://www.nugeninc.com/) Ovation© RNA-Seq System and prepped for sequencing with the Illumina (http://www.illumina.com/) mRNA-Seq protocol. Approximately 30 million single end sequencing reads were obtained for each K562 and GM12878. RIP samples were analyzed for signal that was at or above the 60th percentile and statistically enriched compared to the negative control. Sequences were analyzed using TopHat (http://tophat.cbcb.umd.edu/) (Trapnell et al., 2009) with Bowtie (http://bowtie-bio.sourceforge.net/index.shtml) (Langmead et al., 2009). Peaks were called from the top 40% of TopHat normalized reads, with a max gap, min run of (24:48). Unions of overlapping peak regions from total RNA replicates (RIP-Input) are presented with p-value from a one tailed t-test for average signal from replicates versus 0 (no cut-off was used for totals). Replicate overlap for positive RIP treatment peaks (ELAVL1 and PABPC1) are presented with a p-value from one tailed t-test versus signal for same the region in negative control replicates (T7-tag). RIP peaks were from sequences longer than 120 bp and p-value < .05. For both totals (RIP-input) and RIPs, the peak scores are scaled relative p-values between treatment and control.

Dataset Information

RNA-seq from ENCODE/Caltech (Mouse)

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets