Unknown,Transcriptomics,Genomics,Proteomics

Dataset Information

ENCODE SUNY RNA Binding Protein Tiling Data for K562 and GM12878

ABSTRACT: RNA Binding Protein immunoprecipitation study for multiple target proteins in K562 & GM12878 cell lines as part of the ENCODE consortium. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf Three sample replicates per sample type. Total input RNA and Negative target RIP (antibody for bacteriophage coat protein) are provided as controls. Individual samples reflect reflect biological replicates defined as separate RIP runs (as opposed to technical replicate defined as multiple arrays run with same RIP sample as starting material).

ORGANISM(S): Homo sapiens

SUBMITTER: Frank Doyle

PROVIDER: E-GEOD-40691 | biostudies-arrayexpress |

REPOSITORIES: biostudies-arrayexpress

ACCESS DATA

Similar Datasets

Project description:This data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Jonathan Preall jpreall@cshl.edu (Generation 0 Data from Hannon Lab), Carrie Davis davisc@cshl.edu (experimental), Alex Dobin dobin@cshl.edu (computational), Wei Lin wlin@cshl.edu (computational), Tom Gingeras gingeras@cshl.edu (primary investigator)). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:genome@soe.ucsc.edu). hg18: This data was produced by Hannon lab part of Cold Spring Harbor as part of the ENCODE Project. The series depicts NextGen sequencing information for RNAs between the sizes of 20-200 nt isolated from RNA samples from tissues or sub cellular compartments of cell lines. hg19: This track depicts NextGen sequencing information for RNAs between the sizes of 20-200 nt isolated from RNA samples from tissues or sub cellular compartments from ENCODE cell lines. The overall goal of the ENCODE project is to identify and characterize all functional elements in the sequence of the human genome. hg19: This cloning protocol generates directional libraries that are read from the 5' ends of the inserts, which should largely correspond to the 5' ends of the mature RNAs. The libraries were sequenced on a Solexa platform for a total of 36, 50 or 76 cycles however the reads undergo post-processing resulting in trimming of their 3' ends. Consequently, the mapped read lengths are variable. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf hg18: Small RNAs between 20-200 nt were ribominus treated according to the manufacturer's protocol (Invitrogen) using custom LNA probes targeting ribosomal RNAs (some datasets are also depleted of U snRNAs and high abundant microRNAs). The RNA was treated with Tobacco Alkaline Pyrophosphatase to eliminate any 5' cap structure. Poly-A Polymerase was used to catalyze the addition of C's to the 3' end. The 5' ends were phosphorylated using T4 PNK and an RNA linker was ligated onto the 5' end. Reverse transcription was carried out using a poly-G oligo with a defined 5' extension. The inserts were then amplified using oligos targeting the 5' linker and poly-G extension and containing sequencing adapters. The library was sequenced on an Illumina GA machine for a total of 36, 50 or 76 cycles. Initially 1 lane is run. If an appreciable number of mappable reads are obtained, additional lanes are run. Sequence reads underwent quality filtration using Illumina standard pipeline (Gerlad). The read lengths may exceed the insert sizes and consequently introduce 3' adaptor sequence into the 3' end of the reads. The 3' sequencing adaptor was removed from the reads using a custom clipper program, which aligned the adaptor sequence to the short-reads, allowing up to 2 mismatches and no indels. Regions that aligned were "clipped" off from the read. The trimmed portions were collapsed into identical reads, their count noted and aligned to the human genome (NCBI build 36, hg18 unmasked) using Nexalign (Lassmann et al., not published). The alignment parameters are tuned to tolerate up to 2 mismatches with no indels and will allow for trimmed portions as small as 5 nucleotides to be mapped. We report reads that mapped 10 or fewer times. Data obtained from each lane is processed and mapped independently. The processed/mapped data from each lane is then complied as a single track without additional processing and submitted to UCSC. Consequently, identical reads within a lane were collapsed and their value is reported as the "transfrag" signal value. However, the redundancy between lanes has not been eliminated so the same transfrag may appear multiple times within a signal. hg19: Small RNAs between 20-200 nt were ribominus treated according to the manufacturer's protocol (Invitrogen) using custom LNA probes targeting ribosomal RNAs (some datasets are also depleted of U snRNAs and high abundant microRNAs). The RNA was treated with Tobacco Alkaline Pyrophosphatase to eliminate any 5' cap structures. Poly-A Polymerase was used to catalyze the addition of C's to the 3' end. The 5' ends were phosphorylated using T4 PNK and an RNA linker was ligated onto the 5' end. Reverse transcription was carried out using a poly-G oligo with a defined 5' extension. The inserts were then amplified using oligos targeting the 5' linker and poly-G extension and containing sequencing adapters. The library was sequenced on an Illumina GA machine for a total of 36, 50 or 76 cycles. Initially, one lane was run. If an appreciable number of mappable reads were obtained, additional lanes were run. Sequence reads underwent quality filtration using Illumina standard pipeline (GERALD). The Illumina reads were initially trimmed to discard any bases following a quality score less than or equal to 20 and converted into FASTA format, thereby discarding quality information for the rest of the pipeline. As a result, the sequence quality scores in the BAM output are all displayed as "40" to indicate no quality information. The read lengths may exceed the insert sizes and consequently introduce 3' adapter sequence into the 3' end of the reads. The 3' sequencing adapter was removed from the reads using a custom clipper program (available at http://hannonlab.cshl.edu/fastx_toolkit/), which aligned the adapter sequence to the short-reads using up to 2 mismatches and no indels. Regions that aligned were "clipped" off from the read. Terminal C nucleotides introduced at the 3' end of the RNA via the cloning procedure are also trimmed. The trimmed portions were collapsed into identical reads, their count noted and aligned to the human genome (version hg19, using the gender build appropriate to the sample in question - female/male) using Bowtie (Langmead B. et al). The alignment parameter allowed 0, 1, or 2 mismatches iteratively. We report reads that mapped 20 or fewer times. Discrepancies between hg18 and hg19 versions of CSHL small RNA data: The alignment pipeline for the CSHL small RNA data was updated upon the release of the human genome version hg19, resulting in a few noteworthy discrepancies with the hg18 dataset. First, mapping was conducted with the open-source Bowtie algorithm (http://bowtie-bio.sourceforge.net/index.shtml) rather than the custom NexAlign software. As each algorithm uses different strategies to perform alignments, the mapping results may vary even in genomic regions that do not differ between builds. The read processing pipeline also varies slightly, in that we no longer retain information regarding whether a read was 'clipped' off adapter sequence.

Project description:This data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Scott Tenenbaum mailto:STenenbaum@uamail.albany.edu). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:genome@soe.ucsc.edu). The RNA binding protein (RBP) associated mRNA sequencing track (RIP-Seq) is produced as part of the Encyclopedia of DNA Elements (ENCODE) Project (http://hgwdev.cse.ucsc.edu/ENCODE/index.html). This track displays transcriptional fragments associated with RBP in cell lines (http://hgwdev.cse.ucsc.edu/cgi-bin/hgEncodeVocab?type=cellType) K562 and GM12878, using Ribonomic profiling via Illumina SBS. In eukaryotic organisms gene regulatory networks require an additional level of coordination that links transcriptional and post-transcriptional processes. Messenger RNAs have traditionally been viewed as passive molecules in the pathway from transcription to translation. However, it is now clear that RNA-binding proteins play a major role in regulating multiple mRNAs in order to facilitate gene expression patterns. These tracks show the associated mRNAs that co-precipitate with the targeted RNA-binding proteins using RIP-Seq profiling. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf RBP-mRNA complexes were purified from cells grown according to the approved ENCODE cell culture protocols (http://hgwdev.cse.ucsc.edu/ENCODE/protocols/cell). RNA samples were amplified and converted to cDNA with the Nugen (http://www.nugeninc.com/) Ovation© RNA-Seq System and prepped for sequencing with the Illumina (http://www.illumina.com/) mRNA-Seq protocol. Approximately 30 million single end sequencing reads were obtained for each K562 and GM12878. RIP samples were analyzed for signal that was at or above the 60th percentile and statistically enriched compared to the negative control. Sequences were analyzed using TopHat (http://tophat.cbcb.umd.edu/) (Trapnell et al., 2009) with Bowtie (http://bowtie-bio.sourceforge.net/index.shtml) (Langmead et al., 2009). Peaks were called from the top 40% of TopHat normalized reads, with a max gap, min run of (24:48). Unions of overlapping peak regions from total RNA replicates (RIP-Input) are presented with p-value from a one tailed t-test for average signal from replicates versus 0 (no cut-off was used for totals). Replicate overlap for positive RIP treatment peaks (ELAVL1 and PABPC1) are presented with a p-value from one tailed t-test versus signal for same the region in negative control replicates (T7-tag). RIP peaks were from sequences longer than 120 bp and p-value < .05. For both totals (RIP-input) and RIPs, the peak scores are scaled relative p-values between treatment and control.

Project description:This data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Florencia Pauli mailto:fpauli@hudsonalpha.org). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:genome@soe.ucsc.edu). This track is produced as part of the ENCODE project. The track displays the methylation status of specific CpG dinucleotides in the given cell types as identified by the Illumina Infinium HumanMethylation27 BeadArray platform (http://www.illumina.com/pages.ilmn?ID=243). In general, methylation of CpG sites within a promoter causes silencing of the gene associated with that promoter. Detailed information for the CpG targets is in an XLS formatted spreadsheet on the Myers' lab protocols website (http://hudsonalpha.org/myers-lab/protocols). For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf Cells were grown according to the approved ENCODE cell culture protocols (http://hgwdev.cse.ucsc.edu/ENCODE/protocols/cell). Genomic DNA was isolated from each cell line with the QIAGEN DNeasy Blood & Tissue Kit according to the instructions provided by the manufacturer. DNA concentrations and a level of quality of each preparation was determined by fluorescence with the Qubit Fluorometer (Invitrogen). The Methyl27K platform uses bisulfite treated genomic DNA to assay the methylation status of 27,578 CpG sites within more than 14,000 genes. When genomic DNA is treated with sodium bisulfite, unmethylated cytosine of CpG dinucleotides are converted into uracils; methylated cytosines do not get converted. After bisulfite treatment, the methylation status of a site is assayed by single base-pair extension with a Cy3 or Cy5 labeled nucleotide on oligo-beads specific for the methylated or unmethylated state. A beta value is calculated by Illumina's Bead Studio software for each CpG target. This value represents the intensity value from the methylated bead type divided by the sum of the intensity values from the methylated and unmethylated bead types for any given CpG target. Bisulfite conversion reaction was done using the Zymo Research EZ-96 DNA Methylation Kit (http://www.zymoresearch.com/epigenetics/dna-methylation/ez-96-dna-methylation-kit). One step of the protocol was modified. During the incubation, a 30 sec 95oC denaturing step every hour was included to increase reaction efficiency as recommended by the Illumina Infinium Human Methylation27 protocol. The bead arrays were run according to the protocol provided by Illumina (http://www.illumina.com/pagesnrn.ilmn?ID=275). The intensity data from the BeadArray was processed using Illumina's BeadStudio software with the Methylation Module v3.2. The data was then quality-filtered using p-values. Any beta value equal to or greater than 0.6 is considered fully methylated. Any beta value equal to or less than 0.2 is considered to be fully unmethylated. Beta values between 0.2 and 0.6 are considered to be partially methylated. Beta-values are quality filtered and spots that fall below the minimum intensity threshold are displayed as "NA". Score in the bed files is beta value x 1000

Dataset Information

ENCODE SUNY RNA Binding Protein Tiling Data for K562 and GM12878

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets