Unknown,Transcriptomics,Genomics,Proteomics

Dataset Information

Transcription Factor Binding Sites by ChIP-seq from ENCODE/HAIB

ABSTRACT: This data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Florencia Pauli mailto:fpauli@hudsonalpha.org). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:genome@soe.ucsc.edu). The ChIP-Seq method was used to assay chromatin fragments bound by specific or general transcription factors as described below. DNA isolated by ChIP-Seq was size-selected (~225 bp) and sequenced. Short reads of 25-36 bp were mapped to the human reference genome, and enriched regions of high read density relative to a total input chromatin control reads were identified. The sequence reads with quality scores (fastq files) and alignment coordinates (BAM files) from these experiments are available for download. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf Cells were grown according to the approved ENCODE cell culture protocols (http://hgwdev.cse.ucsc.edu/ENCODE/protocols/cell). Cross-linked chromatin was immunoprecipitated with an antibody. The Protein:DNA crosslinks were then reversed and the DNA fragments were recovered and sequenced. Please see protocol notes below and go to http://hudsonalpha.org/myers-lab/protocols for the most current version of the protocol. Biological replicates from each experiment were completed. Libraries were sequenced with an Illumina Genome Analyzer I or an Illumina Genome Analyzer IIx according to the manufacturer's recommendations. Sequence data produced by the Illumina data pipeline software were quality filtered and then mapped to NCBI Build37 (hg19) using the integrated Eland software; 32 nt of the sequence reads were used for alignment; up to two mismatches were tolerated; reads that mapped to multiple sites in the genome were discarded. To identify likely binding sites, peak calling was applied to the aligned sequence data sets using Model-based Analysis of Chip-Seq MACS (Zhang Y, et al., 2008) (http://liulab.dfci.harvard.edu/MACS/00README.html). MACS models the shift size of ChIP-Seq tags empirically, and uses it to improve the spatial resolution of predicted binding sites. MACS also uses a dynamic Poisson distribution to capture local biases in the genome, allowing for more robust predictions (Zhang Y, et al., 2008). Protocol Notes: Several changes and improvements were made to the original ChIP-Seq protocol (Jonshon et al.,2008). The major differences between protocols are the number of cells and magnetic beads used for IP, the method of sonication used to fragment DNA, and the number of cycles of PCR used to amplify the sequencing library. The most current protocol used by the Myers lab can be found at http://hudsonalpha.org/myers-lab/protocols. The protocol field for each file denotes the version of the protocol used as being PCR1x, PCR2x or a version number (for examples, v041610.1). The sequencing libraries labeled as PCR2x were made with two rounds of amplification (25 and 15 cycles) and those labeled as PCR1x were made with one 15-cycle round of amplification. These experiments were completed prior to January 2010 and were originally aligned to NCBI Build36 (hg18). They have been re-aligned to NCBI Build37 (hg19) with the Bowtie software (Langmead, et al., 2009) for this data release (http://bowtie-bio.sourceforge.net/index.shtml). The libraries labeled with a protocol version number were competed after January 2010 and were only aligned to NCBI Build37 (hg19). Please refer to the Myers Lab website (http://hudsonalpha.org/myers-lab/protocols) for details on each protocol version. Verification: The MACS (http://liulab.dfci.harvard.edu/MACS/00README.html) peak caller was used to call significant peaks on the individual replicates of a ChIP-Seq experiment. Afterwards, the irreproducible discovery rate (IDR) method, developed by Li et al. (submitted), was used to quantify the consistency between pairs of ranked peaks lists from replicates. The IDR methods uses a model that assumes that the ranked lists of peaks in a pair of replicates consist of two groups - a reproducible group and an irreproducible group. In general, the signals in the reproducible group are more consistent (i.e. with a larger rank correlation coefficient) and are ranked higher than the irreproducible group. The proportion of peaks that belong to the irreproducible component and the correlation of the reproducible component are estimated adaptively from the data. The model also provides an IDR score for each peak, which reflects the posterior probability of the peak belonging to the irreproducible group. The aligned reads were pooled from all replicates and the MACS peak caller was used to call significant peaks on the pooled data. Only datasets containing at least 100 peaks passing the IDR threshold are considered valid and submitted for release.

ORGANISM(S): Homo sapiens

SUBMITTER: ENCODE DCC

PROVIDER: E-GEOD-32465 | biostudies-arrayexpress |

REPOSITORIES: biostudies-arrayexpress

ACCESS DATA

Similar Datasets

Project description:This data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Florencia Pauli mailto:fpauli@hudsonalpha.org). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:genome@soe.ucsc.edu). This track is produced as part of the ENCODE project. The track displays the methylation status of specific CpG dinucleotides in the given cell types as identified by the Illumina Infinium HumanMethylation27 BeadArray platform (http://www.illumina.com/pages.ilmn?ID=243). In general, methylation of CpG sites within a promoter causes silencing of the gene associated with that promoter. Detailed information for the CpG targets is in an XLS formatted spreadsheet on the Myers' lab protocols website (http://hudsonalpha.org/myers-lab/protocols). For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf Cells were grown according to the approved ENCODE cell culture protocols (http://hgwdev.cse.ucsc.edu/ENCODE/protocols/cell). Genomic DNA was isolated from each cell line with the QIAGEN DNeasy Blood & Tissue Kit according to the instructions provided by the manufacturer. DNA concentrations and a level of quality of each preparation was determined by fluorescence with the Qubit Fluorometer (Invitrogen). The Methyl27K platform uses bisulfite treated genomic DNA to assay the methylation status of 27,578 CpG sites within more than 14,000 genes. When genomic DNA is treated with sodium bisulfite, unmethylated cytosine of CpG dinucleotides are converted into uracils; methylated cytosines do not get converted. After bisulfite treatment, the methylation status of a site is assayed by single base-pair extension with a Cy3 or Cy5 labeled nucleotide on oligo-beads specific for the methylated or unmethylated state. A beta value is calculated by Illumina's Bead Studio software for each CpG target. This value represents the intensity value from the methylated bead type divided by the sum of the intensity values from the methylated and unmethylated bead types for any given CpG target. Bisulfite conversion reaction was done using the Zymo Research EZ-96 DNA Methylation Kit (http://www.zymoresearch.com/epigenetics/dna-methylation/ez-96-dna-methylation-kit). One step of the protocol was modified. During the incubation, a 30 sec 95oC denaturing step every hour was included to increase reaction efficiency as recommended by the Illumina Infinium Human Methylation27 protocol. The bead arrays were run according to the protocol provided by Illumina (http://www.illumina.com/pagesnrn.ilmn?ID=275). The intensity data from the BeadArray was processed using Illumina's BeadStudio software with the Methylation Module v3.2. The data was then quality-filtered using p-values. Any beta value equal to or greater than 0.6 is considered fully methylated. Any beta value equal to or less than 0.2 is considered to be fully unmethylated. Beta values between 0.2 and 0.6 are considered to be partially methylated. Beta-values are quality filtered and spots that fall below the minimum intensity threshold are displayed as "NA". Score in the bed files is beta value x 1000

Project description:This track is produced as part of the mouse ENCODE Project. This track shows DNaseI sensitivity measured genome-wide in mouse tissues and cell lines using the Digital DNaseI methodology (see below), and DNaseI hypersensitive sites. DNaseI has long been used to map general chromatin accessibility and DNaseI hypersensitivity is a universal feature of active cis-regulatory sequences. The use of this method has led to the discovery of functional regulatory elements that include enhancers, insulators, promotors, locus control regions and novel elements. For each experiment (tissue/cell type) this track shows DNaseI sensitivity as a continuous function using sequencing tag density (Signal), and discrete loci of DNaseI sensitive zones (HotSpots) and hypersensitive sites (Peaks). For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf Cells were grown according to the approved ENCODE cell culture protocols (http://hgwdev.cse.ucsc.edu/ENCODE/protocols/cell/mouse). Fresh tissues were harvested from mice and the nuclei prepared according to the tissue appropriate protocol (http://hgwdev.cse.ucsc.edu/ENCODE/protocols/cell/mouse). Digital DNaseI was performed by DNaseI digestion of intact nuclei, isolating DNaseI 'double-hit' fragments as described in Sabo et al. (2006), and direct sequencing of fragment ends (which correspond to in vivo DNaseI cleavage sites) using the Illumina IIx (and Illumina HiSeq by early 2011) platform (36 bp reads). Uniquely mapping high-quality reads were mapped to the genome using the bowtie aligner. DNaseI sensitivity is directly reflected in raw tag density, which is shown in the track as density of tags mapping within a 150 bp sliding window (at a 20 bp step across the genome). DNaseI sensitive zones (HotSpots) were identified using the HotSpot algorithm described in Sabo et al. (2004). 1.0% false discovery rate thresholds (FDR 0.01) were computed for each cell type by applying the HotSpot algorithm to an equivalent number of random uniquely mapping 36mers. DNaseI hypersensitive sites (DHSs or Peaks) were identified as signal peaks within FDR 1.0% hypersensitive zones using a peak-finding algorithm (I-max).

Project description:This data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Scott Tenenbaum mailto:STenenbaum@uamail.albany.edu). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:genome@soe.ucsc.edu). The RNA binding protein (RBP) associated mRNA sequencing track (RIP-Seq) is produced as part of the Encyclopedia of DNA Elements (ENCODE) Project (http://hgwdev.cse.ucsc.edu/ENCODE/index.html). This track displays transcriptional fragments associated with RBP in cell lines (http://hgwdev.cse.ucsc.edu/cgi-bin/hgEncodeVocab?type=cellType) K562 and GM12878, using Ribonomic profiling via Illumina SBS. In eukaryotic organisms gene regulatory networks require an additional level of coordination that links transcriptional and post-transcriptional processes. Messenger RNAs have traditionally been viewed as passive molecules in the pathway from transcription to translation. However, it is now clear that RNA-binding proteins play a major role in regulating multiple mRNAs in order to facilitate gene expression patterns. These tracks show the associated mRNAs that co-precipitate with the targeted RNA-binding proteins using RIP-Seq profiling. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf RBP-mRNA complexes were purified from cells grown according to the approved ENCODE cell culture protocols (http://hgwdev.cse.ucsc.edu/ENCODE/protocols/cell). RNA samples were amplified and converted to cDNA with the Nugen (http://www.nugeninc.com/) Ovation© RNA-Seq System and prepped for sequencing with the Illumina (http://www.illumina.com/) mRNA-Seq protocol. Approximately 30 million single end sequencing reads were obtained for each K562 and GM12878. RIP samples were analyzed for signal that was at or above the 60th percentile and statistically enriched compared to the negative control. Sequences were analyzed using TopHat (http://tophat.cbcb.umd.edu/) (Trapnell et al., 2009) with Bowtie (http://bowtie-bio.sourceforge.net/index.shtml) (Langmead et al., 2009). Peaks were called from the top 40% of TopHat normalized reads, with a max gap, min run of (24:48). Unions of overlapping peak regions from total RNA replicates (RIP-Input) are presented with p-value from a one tailed t-test for average signal from replicates versus 0 (no cut-off was used for totals). Replicate overlap for positive RIP treatment peaks (ELAVL1 and PABPC1) are presented with a p-value from one tailed t-test versus signal for same the region in negative control replicates (T7-tag). RIP peaks were from sequences longer than 120 bp and p-value < .05. For both totals (RIP-input) and RIPs, the peak scores are scaled relative p-values between treatment and control.

Dataset Information

Transcription Factor Binding Sites by ChIP-seq from ENCODE/HAIB

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets