Project description:This data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Yijun Ruan mailto:ruanyj@gis.a-star.edu.sg). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:genome@soe.ucsc.edu). This track, produced as part of the ENCODE Project, contains deep sequencing DNase data that will be used to identify sites where regulatory factors bind to the genome (footprints). Footprinting is a technique used to define the DNA sequences that interact with and bind DNA-binding proteins, such as transcription factors, zinc-finger proteins, hormone-receptor complexes, and other chromatin-modulating factors like CTCF. The technique depends upon the strength and tight nature of protein-DNA interactions. In their native chromatin state, DNA sequences that interact directly with DNA-binding proteins are relatively protected from DNA degrading endonucleases, while the exposed/unbound portions are readily degraded by such endonucleases. A massively parallel next-generation sequencing technique to define the DNase hypersensitive sites in the genome was adopted. Sequencing these next-generation-sequencing DNase samples to significantly higher depths of 300-fold or greater produces a base-pair level resolution of the DNase susceptibility maps of the native chromatin state. These base-pair resolution maps represent and are dependent upon the nature and the specificity of interaction of the DNA with the regulatory/modulatory proteins binding at specific loci in the genome; thus they represent the native chromatin state of the genome under investigation. The deep sequencing approach has been used to define the footprint landscape of the genome by identifying DNA motifs that interact with known or novel DNA binding proteins. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf Cells were grown according to the approved ENCODE cell culture protocols. Digital DNaseI was performed by DNaseI digestion of intact nuclei, followed by isolating DNaseI 'double-hit' fragments as described in Sabo et al. (2006), and direct sequencing of fragment ends (which correspond to in vivo DNaseI cleavage sites) using the Solexa platform (27 bp reads). High-quality reads were mapped to the GRCh37/hg19 human genome using Bowtie 0.12.5 (Eland was used to map to NCBI36/hg18); only unique mappings were kept. DNaseI sensitivity is directly reflected in raw tag density (Signal), which is shown in the track as density of tags mapping within a 150 bp sliding window (at a 20 bp step across the genome). DNaseI hypersensitive zones (HotSpots) were identified using the HotSpot algorithm described in Sabo et al. (2004). False discovery rate thresholds of 1.0% (FDR 0.01) were computed for each cell type by applying the HotSpot algorithm to an equivalent number of random uniquely mapping 36-mers. DNaseI hypersensitive sites (DHSs or Peaks) were identified as signal peaks within 1.0% (FDR 0.01) hypersensitive zones using a peak-finding algorithm. Only DNase Solexa libraries from unique cell types producing the highest quality data, as defined by Percent Tags in Hotspots (PTIH ~40%) were designated for deep sequencing to a depth of over 200 million tags.

Project description:This data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (mailto:nshoresh@broad.mit.edu). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:genome@soe.ucsc.edu). This track displays maps of chromatin state generated by the Broad/MGH ENCODE group using ChIP-seq. Chemical modifications (methylation, acetylation) to the histone proteins present in chromatin influence gene expression by changing how accessible the chromatin is to transcription. The ChIP-seq method involves first using formaldehyde to cross-link histones and other DNA-associated proteins to genomic DNA within cells. The cross-linked chromatin is subsequently extracted, mechanically sheared, and immunoprecipitated using specific antibodies. After reversal of cross-links, the immunoprecipitated DNA is sequenced and mapped to the human reference genome. The relative enrichment of each antibody-target (epitope) across the genome is inferred from the density of mapped fragments. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf ChIP-seq: Cells were grown according to the approved ENCODE cell culture protocols. Cells were fixed in 1% formaldehyde and resuspended in lysis buffer. Chromatin was sheared to 200-700 bp using a Diagenode Bioruptor. Solubilized chromatin was immunoprecipitated with antibodies against each of the histone antibodies listed above. Antibody-chromatin complexes were pulled-down using protein A-sepharose (or anti-IgM-conjugated agarose for RNA polymerase II), washed and then eluted. After cross-link reversal and proteinase K treatment, immunoprecipitated DNA was extracted with phenol-chloroform, ethanol precipitated, treated with RNAse and purified. One to ten nanograms of DNA were end-repaired, adapter-ligated and sequenced by Illumina Genome Analyzers as recommended by the manufacturer. Alignment: Sequence reads from each IP experiment were aligned to the human reference genome (GRCh37/hg19) using MAQ with default parameters, except '-C 11' and '-H output_file', which outputs up to 11 additional best matches for each read (if any are found) to a file. This information was used to filter out any read that had more than 10 best matches on the genome. Note: It is likely that instances where multiple reads align to the same position and with the same orientation are due to enhanced PCR amplification of a single DNA fragment. No attempt has been made, however, to remove such artifacts from the data, following ENCODE practices. Signal: Fragment densities were computed by counting the number of reads overlapping each 25 bp bin along the genome. Densities were computed using igvtools count with default parameters (in particular, '-w 25' to set window size of 25 bp and '-f mean' to report the mean value across the window), except for '-e' set to extend the reads to 200 bp, and the .wig output was converted to bigWig using wigToBigWig from the UCSC Kent software package. Peaks: Discrete intervals of ChIP-seq fragment enrichment were identified using Scripture, a scan statistics approach, under the assumption of uniform background signal. All data sets where processed with '-task chip', and with '-windows 100,200,500,1000,5000,10000,100000'. (No mask file nor the '-trim' option have been used.) The resulting called segments were then further filtered to remove intervals that are significantly enriched only because they contain smaller enriched intervals within them. This post-processing step has been implemented using Matlab. The use of the post-processing step allowed very large enriched intervals (of the order of Mbps for H3K27me3, for instance) to be detected, as well as much smaller intervals, without the need to tailor the parameters of Scripture based on prior expectations.

Dataset Information

Mediterranean Institute of Oceanography

Dataset's files

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets