Unknown,Transcriptomics,Genomics,Proteomics

Dataset Information

Open Chromatin by DNaseI HS from ENCODE/OpenChrom(Duke University)

ABSTRACT: This data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Terry Furey mailto:tsfurey@duke.edu). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:genome@soe.ucsc.edu). These tracks display DNaseI hypersensitivity (HS) evidence as part of the four Open Chromatin track sets. DNaseI is an enzyme that has long been used to map general chromatin accessibility, and DNaseI "hypersensitivity" is a feature of active cis-regulatory sequences. The use of this method has led to the discovery of functional regulatory elements that include promoters, enhancers, silencers, insulators, locus control regions, and novel elements. DNaseI hypersensitivity signifies chromatin accessibility following binding of trans-acting factors in place of a canonical nucleosome. Together with FAIRE and ChIP-seq experiments, these tracks display the locations of active regulatory elements identified as open chromatin in multiple cell types (http://hgwdev.cse.ucsc.edu/cgi-bin/hgEncodeVocab?type=cellType) from the Duke, UNC-Chapel Hill, UT-Austin, and EBI ENCODE group. Within this project, open chromatin was identified using two independent and complementary methods: these DNaseI HS assays and Formaldehyde-Assisted Isolation of Regulatory Elements (FAIRE), combined with chromatin immunoprecipitation (ChIP) for select regulatory factors. DNaseI HS and FAIRE provide assay cross-validation with commonly identified regions delineating the highest confidence areas of open chromatin. ChIP assays provide functional validation and preliminary annotation of a subset of open chromatin sites. Each method employed Illumina (formerly Solexa) sequencing by synthesis as the detection platform. The Tier 1 and Tier 2 cell types were additional verified by a second platform, high-resolution 1% ENCODE tiled microarrays supplied by NimbleGen. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf Cells were grown according to the approved ENCODE cell culture protocols (http://hgwdev.cse.ucsc.edu/ENCODE/protocols/cell). DNaseI hypersensitive sites were isolated using methods called DNase-seq or DNase-chip (Song and Crawford, 2010; Boyle et al., 2008a; Crawford et al., 2006). Briefly, cells were lysed with NP40, and intact nuclei were digested with optimal levels of DNaseI enzyme. DNaseI digested ends were captured from three different DNase concentrations, and material was sequenced using Illumina (Solexa) sequencing. DNase-seq data for Tier 1 and Tier 2 cell lines were verified by comparing multiple independent growths (replicates) and determining the reproducibility of the data. In general, cell lines were verified if 80% of the top 50,000 peaks in one replicate are detected in the top 100,000 peaks of a second replicate. For some cell types, additional verification was performed using similar material hybridized to NimbleGen Human ENCODE tiling arrays (1% of the genome) along with the input DNA as reference (DNase-chip). A more detailed protocol is available at http://hgwdev.cse.ucsc.edu/ENCODE/protocols/general/Duke_DNase_protocol.pdf. The read length for sequences from DNase-seq are 20 bases long due to a MmeI cutting step of the approximately >50kb DNA fragments extracted after DNaseI digestion. Sequences from each experiment were aligned to the genome using BWA (Li et al., 2010) for the NCBI 36 (hg19) assembly. The command used for these alignments was > bwa aln -t 8 genome.fa s_1.sequence.txt.bfq > s_1.sequence.txt.sai Where genome.fa is the whole genome sequence and s_1.sequence.txt.bfq is one lane of sequences convert into the required bfq format. Sequences from multiple lanes are combined for a single replicate using the bwa samse command, and converted in the sam/bam format using samtools. Only those that aligned to 4 or fewer locations were retained. Other sequences were also filtered based on their alignment to problematic regions (such as satellites and rRNA genes - see supplemental materials at http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeOpenChromDnase/supplemental/). The mappings of these short reads to the genome are available for download at http://hgwdev.cse.ucsc.edu/cgi-bin/hgFileUi?g=wgEncodeOpenChromDnase. The resulting digital signal was converted to a continuous wiggle track using F-Seq that employs Parzen kernel density estimation to create base pair scores (Boyle et al., 2008b). Input data has been generated for several cell lines. These are used directly to create a control/background model used for F-Seq when generating signal annotations for these cell lines. These models are meant to correct for sequencing biases, alignment artifacts, and copy number changes in these cell lines. Input data is not being generated directly for other cell lines. Instead, a general background model was derived from the available Input data sets. This should provide corrections for sequencing biases and alignment artifacts, but will not correct for cell type specific copy number changes. The exact command used for this step is > fseq -l 600 -v -f 0 -b -p aligments.bed where the (bff files) are the background files based on alignability, the (iff files) are the background files based on the Input experiments, and alignments.bed are a bed file of filtered sequence alignments. Discrete DNaseI HS sites (peaks) were identified from DNase-seq F-seq density signal. Significant regions were determined by fitting the data to a gamma distribution to calculate p-values. Contiguous regions where p-values were below a 0.05/0.01 threshold were considered significant. Data from the high-resolution 1% ENCODE tiled microarrays supplied by NimbleGen were normalized using the Tukey biweight normalization, and peaks were called using ChIPOTle (Buck, et al., 2005) at multiple levels of significance. Regions matched on size to these peaks that were devoid of any significant signal were also created as a null model. These data were used for additional verification of Tier 1 and Tier 2 cell lines by ROC analysis. Files containing this data can be found in the Downloads directory (http://hgwdev.cse.ucsc.edu/cgi-bin/hgFileUi?db=hg19&g=wgEncodeOpenChromDnase) labeled Validation view. Release 1 (April 2011) of this track consists of a remapping of all previously released experiments to the human reference genome GRCh37/hg19 (these data were previously mapped to NCBI36/hg18; please see the Release Notes section of the hg18 Open Chromatin track (http://hgwdev.cse.ucsc.edu/cgi-bin/hgTrackUi?db=hg18&g=wgEncodeChromatinMap) for information on the NCBI36/hg18 releases of the data). There are 21 new DNaseI experiments in this release, on 19 new cell lines. New to this release is a reconfiguration of how this track is displayed in relation to other tracks from the Duke/UNC/UT-Austin/EBI group. A synthesis of open chromatin evidence from the three assay types was compiled for Tier 1 and 2 cell lines plus NHEK will also be added in this release and can be previewed in: Open Chromatin Synthesis (http://genome-preview.cse.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeOpenChromSynth). Enhancer and Insulator Functional assays: A subset of DNase and FAIRE regions were cloned into functional tissue culture reporter assays to test for enhancer and insulator activity. Coordinates and results from these experiments can be found at http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeOpenChromDnase/supplemental/.

ORGANISM(S): Homo sapiens

SUBMITTER: ENCODE DCC

PROVIDER: E-GEOD-32970 | biostudies-arrayexpress |

REPOSITORIES: biostudies-arrayexpress

ACCESS DATA

Publications

Predicting cell-type-specific gene expression from regions of open chromatin.

Natarajan Anirudh A Yardimci Galip Gürkan GG Sheffield Nathan C NC Crawford Gregory E GE Ohler Uwe U

Genome research 20120901 9

Complex patterns of cell-type-specific gene expression are thought to be achieved by combinatorial binding of transcription factors (TFs) to sequence elements in regulatory regions. Predicting cell-type-specific expression in mammals has been hindered by the oftentimes unknown location of distal regulatory regions. To alleviate this bottleneck, we used DNase-seq data from 19 diverse human cell types to identify proximal and distal regulatory elements at genome-wide scale. Matched expression data ...[more]

PMID: 22955983

Publication: 1/2

Similar Datasets

Project description:This data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Terry Furey mailto:tsfurey@duke.edu). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:genome@soe.ucsc.edu). These tracks display Formaldehyde-Assisted Isolation of Regulatory Elements (FAIRE) evidence as part of the four Open Chromatin track sets. FAIRE is a method to isolate and identify nucleosome-depleted regions of the genome. FAIRE was initially discovered in yeast and subsequently shown to identify active regulatory elements in human cells (Giresi et al., 2007). Similar to DNaseI HS, FAIRE appears to identify functional regulatory elements that include promoters, enhancers, silencers, insulators, locus control regions and novel elements. Together with DNaseI HS and ChIP-seq experiments, these tracks display the locations of active regulatory elements identified as open chromatin in multiple cell types (http://hgwdev.cse.ucsc.edu/cgi-bin/hgEncodeVocab?type=cellType) from the Duke, UNC-Chapel Hill, UT-Austin, and EBI ENCODE group. Within this project, open chromatin was identified using two independent and complementary methods: DNaseI hypersensitivity (HS) and these FAIRE assays, combined with chromatin immunoprecipitation (ChIP) for select regulatory factors. DNaseI HS and FAIRE provide assay cross-validation with commonly identified regions delineating the highest confidence areas of open chromatin. ChIP assays provide functional validation and preliminary annotation of a subset of open chromatin sites. Each method employed Illumina (formerly Solexa) sequencing by synthesis as the detection platform. The Tier 1 and Tier 2 cell types were additionally verified by a second platform, high-resolution 1% ENCODE tiled microarrays supplied by NimbleGen. Release 1 (March 2011) of this track consists of a remapping of all previously released experiments to the human reference genome GRCh37/hg19 (these data were previously mapped to NCBI36/hg18; please see the Release Notes section of the hg18 Open Chromatin track (http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg18&g=wgEncodeChromatinMap) for information on the NCBI36/hg18 releases of the data). -There are 12 new FAIRE experiments in this release, on 10 new cell lines. -New to this release is a reconfiguration of how this track is displayed in relation to other tracks from the Duke/UNC/UT-Austin/EBI group. -A synthesis of open chromatin evidence from the three assay types was compiled for Tier 1 and 2 cell lines plus NHEK will also be added in this release and can be previewed in: Open Chromatin Synthesis (http://genome-preview.cse.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeOpenChromSynth). -Enhancer and Insulator Functional assays: A subset of DNase and FAIRE regions were cloned into functional tissue culture reporter assays to test for enhancer and insulator activity. Coordinates and results from these experiments can be found at http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeOpenChromFaire/supplemental/. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf Cells were grown according to the approved ENCODE cell culture protocols (http://hgwdev.cse.ucsc.edu/ENCODE/protocols/cell). FAIRE was performed (Giresi et al., 2007) by cross-linking proteins to DNA using 1% formaldehyde solution, and the complex was sheared using sonication. Phenol/chloroform extractions were performed to remove DNA fragments cross-linked to protein. The DNA recovered in the aqueous phase was sequenced using an Illumina (Solexa) sequencing system. FAIRE-seq data for Tier 1 and Tier 2 cell lines were verified by comparing multiple independent growths (replicates) and determining the reproducibility of the data. For some cell types, additional verification was performed using the same material but hybridized to NimbleGen Human ENCODE tiling arrays (1% of the genome) along with the input DNA as reference (FAIRE-chip). A more detailed protocol is available at http://hgwdev.cse.ucsc.edu/ENCODE/protocols/general/FAIRE_UNC_procedure.pdf. Also see Giresi et al., 2009. DNA fragments isolated by FAIRE are 100-200 bp in length, with the average length being 134 bp. Sequences from each experiment were aligned to the genome using BWA (Li et al., 2010) for the NCBI 36 (hg19) assembly. The command used for these alignments was: > bwa aln -t 8 genome.fa s_1.sequence.txt.bfq > s_1.sequence.txt.sai Where genome.fa is the whole genome sequence and s_1.sequence.txt.bfq is one lane of sequences converted into the required bfq format. Sequences from multiple lanes are combined for a single replicate using the bwa samse command, and converted in the sam/bam format using samtools. Only those that aligned to 4 or fewer locations were retained. Other sequences were also filtered based on their alignment to problematic regions (such as satellites and rRNA genes - see supplemental materials http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeOpenChromFaire/supplemental/). The mappings of these short reads to the genome are available for download at http://hgwdev.cse.ucsc.edu/cgi-bin/hgFileUi?g=wgEncodeOpenChromFaire. The resulting digital signal was converted to a continuous wiggle track using F-Seq that employs Parzen kernel density estimation to create base pair scores (Boyle et al., 2008b). Input data has been generated for several cell lines. These are used directly to create a control/background model used for F-Seq when generating signal annotations for these cell lines. These models are meant to correct for sequencing biases, alignment artifacts, and copy number changes in these cell lines. Input data is not being generated directly for other cell lines. Instead, a general background model was derived from the available Input data sets. This should provide corrections for sequencing biases and alignment artifacts, but will not correct for cell type specific copy number changes. The exact command used for this step is: > fseq -l 800 -v -b <bff files> -p <iff files> aligments.bed Where the (bff files) are the background files based on alignability, the (iff files) are the background files based on the Input experiments, and alignments.bed are a bed file of filtered sequence alignments. Discrete FAIRE sites (peaks) were identified from FAIRE-seq F-seq density signal. Significant regions were determined by fitting the data to a gamma distribution to calculate p-values. Contiguous regions data to a gamma distribution to calculate p-values. Contiguous regions where p-values were below a 0.05/0.01 threshold were considered significant. Data from the high-resolution 1% ENCODE tiled microarrays supplied by NimbleGen were normalized using the Tukey biweight normalization, and peaks were called using ChIPOTle (Buck, et al., 2005) at multiple levels of significance. Regions matched on size to these peaks that were devoid of any significant signal were also created as a null model. These data were used for additional verification of Tier 1 and Tier 2 cell lines by ROC analysis. Files containing this data can be found in the Downloads directory labeled Validation view (http://hgwdev.cse.ucsc.edu/cgi-bin/hgFileUi?db=hg19&g=wgEncodeOpenChromFaire).

Project description:This track displays a chromatin state segmentation for each of nine human cell types (http://hgwdev.cse.ucsc.edu/cgi-bin/hgEncodeVocab?term=GM12878,H1-hESC,HepG2,HUVEC,HMEC,HSMM,K562,NHEK,NHLF). A common set of states across the cell types were learned by computationally integrating ChIP-seq data for nine factors plus input (http://hgwdev.cse.ucsc.edu/cgi-bin/hgEncodeVocab?term=CTCF,H3K4me1,H3K4me2,H3K4me3,H3K27ac,H3K9ac,H3K36me3,H4K20me1,H3K27me3,Input) using a Hidden Markov Model (HMM). In total, fifteen states were used to segment the genome, and these states were then grouped and colored to highlight predicted functional elements. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf ChIP-seq data from the Broad Histone (http://hgwdev.cse.ucsc.edu/cgi-bin/hgTrackUi?db=hg18&g=wgEncodeBroadChipSeq) track was used to generate this track. Data for nine factors plus input (http://hgwdev.cse.ucsc.edu/cgi-bin/hgEncodeVocab?term=CTCF,H3K4me1,H3K4me2,H3K4me3,H3K27ac,H3K9ac,H3K36me3,H4K20me1,H3K27me3,Input) and nine cell types (http://hgwdev.cse.ucsc.edu/cgi-bin/hgEncodeVocab?term=GM12878,H1-hESC,HepG2,HUVEC,HMEC,HSMM,K562,NHEK,NHLF) was binarized separately at a 200 base pair resolution based on a Poisson background model. The chromatin states were learned from this binarized data using a multivariate Hidden Markov Model (HMM) that explicitly models the combinatorial patterns of observed modifications (Ernst and Kellis, 2010). To learn a common set of states across the nine cell types, first the genomes were concatenated across the cell types. For each of the nine cell types, each 200 base pair interval was then assigned to its most likely state under the model. Detailed information about the model parameters and state enrichments can be found in (Ernst et al, accepted). This is release 1 (Jun 2011) of this track, and it is based on the NCBI36/hg18 release of the Broad Histone (http://hgwdev.cse.ucsc.edu/cgi-bin/hgTrackUi?db=hg18&g=wgEncodeBroadChipSeq) track. This track has also been lifted over to GRCh37/hg19 (http://hgwdev.cse.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeBroadHmm). It is anticipated that the HMM methods will be run on the newer GRCh37/hg19 Broad Histone (http://hgwdev.cse.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeBroadHistone) data and will replace the lifted version.

Project description:This track is produced as part of the mouse ENCODE Project. This track shows DNaseI sensitivity measured genome-wide in mouse tissues and cell lines using the Digital DNaseI methodology (see below), and DNaseI hypersensitive sites. DNaseI has long been used to map general chromatin accessibility and DNaseI hypersensitivity is a universal feature of active cis-regulatory sequences. The use of this method has led to the discovery of functional regulatory elements that include enhancers, insulators, promotors, locus control regions and novel elements. For each experiment (tissue/cell type) this track shows DNaseI sensitivity as a continuous function using sequencing tag density (Signal), and discrete loci of DNaseI sensitive zones (HotSpots) and hypersensitive sites (Peaks). For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf Cells were grown according to the approved ENCODE cell culture protocols (http://hgwdev.cse.ucsc.edu/ENCODE/protocols/cell/mouse). Fresh tissues were harvested from mice and the nuclei prepared according to the tissue appropriate protocol (http://hgwdev.cse.ucsc.edu/ENCODE/protocols/cell/mouse). Digital DNaseI was performed by DNaseI digestion of intact nuclei, isolating DNaseI 'double-hit' fragments as described in Sabo et al. (2006), and direct sequencing of fragment ends (which correspond to in vivo DNaseI cleavage sites) using the Illumina IIx (and Illumina HiSeq by early 2011) platform (36 bp reads). Uniquely mapping high-quality reads were mapped to the genome using the bowtie aligner. DNaseI sensitivity is directly reflected in raw tag density, which is shown in the track as density of tags mapping within a 150 bp sliding window (at a 20 bp step across the genome). DNaseI sensitive zones (HotSpots) were identified using the HotSpot algorithm described in Sabo et al. (2004). 1.0% false discovery rate thresholds (FDR 0.01) were computed for each cell type by applying the HotSpot algorithm to an equivalent number of random uniquely mapping 36mers. DNaseI hypersensitive sites (DHSs or Peaks) were identified as signal peaks within FDR 1.0% hypersensitive zones using a peak-finding algorithm (I-max).

Project description:This data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Richard Sandstrom mailto:sull@u.washington.edu). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:genome@soe.ucsc.edu). This track, produced as part of the mouse ENCODE Project, contains deep sequencing DNase data that will be used to identify sites where regulatory factors bind to the genome (footprints). Footprinting is a technique used to define the DNA sequences that interact with and bind DNA-binding proteins, such as transcription factors, zinc-finger proteins, hormone-receptor complexes, and other chromatin-modulating factors like CTCF. The technique depends upon the strength and tight nature of protein-DNA interactions. In their native chromatin state, DNA sequences that interact directly with DNA-binding proteins are relatively protected from DNA degrading endonucleases, while the exposed/unbound portions are readily degraded by such endonucleases. A massively parallel next-generation sequencing technique to define the DNase hypersensitive sites in the genome was adopted. The DNase samples were sequenced using next-generation sequencing machines to significantly higher depths of 300-fold or greater. This produces a base-pair level resolution of the DNase susceptibility maps of the native chromatin state. These base-pair resolution maps represent and are dependent upon the nature and the specificity of interaction of the DNA with the regulatory/modulatory proteins binding at specific loci in the genome; thus they represent the native chromatin state of the genome under investigation. The deep sequencing approach has been used to define the footprint landscape of the genome by identifying DNA motifs that interact with known or novel DNA binding proteins. Cells were grown according to the approved ENCODE cell culture protocols (http://hgwdev.cse.ucsc.edu/ENCODE/protocols/cell/mouse). Digital DNaseI was performed by DNaseI digestion of intact nuclei, followed by isolating DNaseI "double-hit" fragments (Sabo et al., 2006), and direct sequencing of fragment ends (which correspond to in vivo DNaseI cleavage sites) using the Solexa platform (27 bp reads). High-quality reads were mapped to the NCBI37/mm9 mouse genome using Bowtie 0.12.5; only unique mappings were kept. DNaseI sensitivity is directly reflected in raw tag density (Raw Signal), which is shown in the track as density of tags mapping within a 150 bp sliding window (at a 20 bp step across the genome). DNaseI hypersensitive zones (HotSpots) were identified using the HotSpot algorithm (Sabo et al., 2004). False discovery rate thresholds of 1.0% (FDR 0.01) were computed for each cell type by applying the HotSpot algorithm to an equivalent number of random uniquely mapping 36-mers. DNaseI hypersensitive sites (DHSs or Peaks) were identified as signal peaks within 1.0% (FDR 0.01) hypersensitive zones using a peak-finding algorithm. Only DNase Solexa libraries from unique cell types producing the highest quality data, as defined by Percent Tags in Hotspots (PTIH ~40%), were designated for deep sequencing to a depth of over 200 million tags. Results were validated by conventional DNaseI hypersensitivity assays using end-labeling/Southern blotting methods.

Project description:This data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Richard Sandstrom mailto:sull@u.washington.edu). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:genome@soe.ucsc.edu). This track, produced as part of the ENCODE Project, contains deep sequencing DNase data that will be used to identify sites where regulatory factors bind to the genome (footprints). Footprinting is a technique used to define the DNA sequences that interact with and bind DNA-binding proteins, such as transcription factors, zinc-finger proteins, hormone-receptor complexes, and other chromatin-modulating factors like CTCF. The technique depends upon the strength and tight nature of protein-DNA interactions. In their native chromatin state, DNA sequences that interact directly with DNA-binding proteins are relatively protected from DNA degrading endonucleases, while the exposed/unbound portions are readily degraded by such endonucleases. A massively parallel next-generation sequencing technique to define the DNase hypersensitive sites in the genome was adopted. Sequencing these next-generation-sequencing DNase samples to significantly higher depths of 300-fold or greater produces a base-pair level resolution of the DNase susceptibility maps of the native chromatin state. These base-pair resolution maps represent and are dependent upon the nature and the specificity of interaction of the DNA with the regulatory/modulatory proteins binding at specific loci in the genome; thus they represent the native chromatin state of the genome under investigation. The deep sequencing approach has been used to define the footprint landscape of the genome by identifying DNA motifs that interact with known or novel DNA binding proteins. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf Cells were grown according to the approved ENCODE cell culture protocols. Digital DNaseI was performed by DNaseI digestion of intact nuclei, followed by isolating DNaseI 'double-hit' fragments as described in Sabo et al. (2006), and direct sequencing of fragment ends (which correspond to in vivo DNaseI cleavage sites) using the Solexa platform (27 bp reads). High-quality reads were mapped to the GRCh37/hg19 human genome using Bowtie 0.12.5 (Eland was used to map to NCBI36/hg18); only unique mappings were kept. DNaseI sensitivity is directly reflected in raw tag density (Signal), which is shown in the track as density of tags mapping within a 150 bp sliding window (at a 20 bp step across the genome). DNaseI hypersensitive zones (HotSpots) were identified using the HotSpot algorithm described in Sabo et al. (2004). False discovery rate thresholds of 1.0% (FDR 0.01) were computed for each cell type by applying the HotSpot algorithm to an equivalent number of random uniquely mapping 36-mers. DNaseI hypersensitive sites (DHSs or Peaks) were identified as signal peaks within 1.0% (FDR 0.01) hypersensitive zones using a peak-finding algorithm. Only DNase Solexa libraries from unique cell types producing the highest quality data, as defined by Percent Tags in Hotspots (PTIH ~40%) were designated for deep sequencing to a depth of over 200 million tags.

Project description:This data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Yijun Ruan mailto:ruanyj@gis.a-star.edu.sg). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:genome@soe.ucsc.edu). This track, produced as part of the ENCODE Project, contains deep sequencing DNase data that will be used to identify sites where regulatory factors bind to the genome (footprints). Footprinting is a technique used to define the DNA sequences that interact with and bind DNA-binding proteins, such as transcription factors, zinc-finger proteins, hormone-receptor complexes, and other chromatin-modulating factors like CTCF. The technique depends upon the strength and tight nature of protein-DNA interactions. In their native chromatin state, DNA sequences that interact directly with DNA-binding proteins are relatively protected from DNA degrading endonucleases, while the exposed/unbound portions are readily degraded by such endonucleases. A massively parallel next-generation sequencing technique to define the DNase hypersensitive sites in the genome was adopted. Sequencing these next-generation-sequencing DNase samples to significantly higher depths of 300-fold or greater produces a base-pair level resolution of the DNase susceptibility maps of the native chromatin state. These base-pair resolution maps represent and are dependent upon the nature and the specificity of interaction of the DNA with the regulatory/modulatory proteins binding at specific loci in the genome; thus they represent the native chromatin state of the genome under investigation. The deep sequencing approach has been used to define the footprint landscape of the genome by identifying DNA motifs that interact with known or novel DNA binding proteins. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf Cells were grown according to the approved ENCODE cell culture protocols. Digital DNaseI was performed by DNaseI digestion of intact nuclei, followed by isolating DNaseI 'double-hit' fragments as described in Sabo et al. (2006), and direct sequencing of fragment ends (which correspond to in vivo DNaseI cleavage sites) using the Solexa platform (27 bp reads). High-quality reads were mapped to the GRCh37/hg19 human genome using Bowtie 0.12.5 (Eland was used to map to NCBI36/hg18); only unique mappings were kept. DNaseI sensitivity is directly reflected in raw tag density (Signal), which is shown in the track as density of tags mapping within a 150 bp sliding window (at a 20 bp step across the genome). DNaseI hypersensitive zones (HotSpots) were identified using the HotSpot algorithm described in Sabo et al. (2004). False discovery rate thresholds of 1.0% (FDR 0.01) were computed for each cell type by applying the HotSpot algorithm to an equivalent number of random uniquely mapping 36-mers. DNaseI hypersensitive sites (DHSs or Peaks) were identified as signal peaks within 1.0% (FDR 0.01) hypersensitive zones using a peak-finding algorithm. Only DNase Solexa libraries from unique cell types producing the highest quality data, as defined by Percent Tags in Hotspots (PTIH ~40%) were designated for deep sequencing to a depth of over 200 million tags.

Dataset Information

Open Chromatin by DNaseI HS from ENCODE/OpenChrom(Duke University)

Publications

Predicting cell-type-specific gene expression from regions of open chromatin.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets