Project description:This data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Ross Hardison mailto:rch8@psu.edu). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:genome@soe.ucsc.edu). Rationale for the Mouse ENCODE project Our knowledge of the function of genomic DNA sequences comes from three basic approaches. Genetics uses changes in behavior or structure of a cell or organism in response to changes in DNA sequence to infer function of the altered sequence. Biochemical approaches monitor states of histone modification, binding of specific transcription factors, accessibility to DNases and other epigenetic features along genomic DNA. In general, these are associated with gene activity, but the precise relationships remain to be established. The third approach is evolutionary, using comparisons among homologous DNA sequences to find segments that are evolving more slowly or more rapidly than expected given the local rate of neutral change. These are inferred to be under negative or positive selection, respectively, and we interpret these as DNA sequences needed for a preserved (negative selection) or adaptive (positive selection) function. The ENCODE project aims to discover all the DNA sequences associated with various epigenetic features, with the reasonable expectation that these will also be functional (best tested by genetic methods). However, it is not clear how to relate these results with those from evolutionary analyses. The mouse ENCODE project aims to make this connection explicitly and with a moderate breadth. Assays identical to those being used in the ENCODE project are performed in cell types in mouse that are similar or homologous to those studied in the human project. Thus we will be able to discover which epigenetic features are conserved between mouse and human, and we can examine the extent to which these overlap with the DNA sequences under negative selection. The contribution of DNA that with a function preserved in mammals versus that with a function in only one species will be discovered. The results will have a significant impact on our understanding of the evolution of gene regulation. Maps of histone modifications Levels of three histone modifications are being determined. H3K4me1 (monomethylation of lysine 4 of histone H3) is a mark for active chromatin and in the absence of H3K4me3, it is one indicator of an enhancer. H3K4me3 (trimethylation of lysine 4 of histone H3) is highly enriched at active promoters. One repressive (Polycomb) mark, H3K27me3, is associated with some silenced genes. Maps of genomic DNA in chromatin with these histone modifications are generated by ChIP-seq. This consists of two basic steps: chromatin immunoprecipitation (ChIP) is used to highly enrich genomic DNA for the segments bound by specific proteins (the antigens recognized by the antibodies) followed by massively parallel short read sequencing to tag the enriched DNA segments. Sequencing is done on the Illumina GAIIx and HiSeq. The sequence tags are mapped back to the mouse genome (Langmead et al. 2009), and a graph of the enrichment for histone modifications are displayed as the "Signal" track (essentially the counts of mapped reads per interval) and the deduced probable binding sites from the MACS program (Zhang et al. 2008) are shown in the "Peaks" track. Each experiment is associated with an input signal, which represents the control condition where immunoprecipitation with non-specific immunoglobulin was performed in the same cell type. The sequence reads, quality scores, and alignment coordinates from these experiments are available for download.

Project description:This data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Ross Hardison mailto:rch8@psu.edu). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:genome@soe.ucsc.edu). Rationale for the Mouse ENCODE project Our knowledge of the function of genomic DNA sequences comes from three basic approaches. Genetics uses changes in behavior or structure of a cell or organism in response to changes in DNA sequence to infer function of the altered sequence. Biochemical approaches monitor states of histone modification, binding of specific transcription factors, accessibility to DNases and other epigenetic features along genomic DNA. In general, these are associated with gene activity, but the precise relationships remain to be established. The third approach is evolutionary, using comparisons among homologous DNA sequences to find segments that are evolving more slowly or more rapidly than expected given the local rate of neutral change. These are inferred to be under negative or positive selection, respectively, and we interpret these as DNA sequences needed for a preserved (negative selection) or adaptive (positive selection) function. The ENCODE project aims to discover all the DNA sequences associated with various epigenetic features, with the reasonable expectation that these will also be functional (best tested by genetic methods). However, it is not clear how to relate these results with those from evolutionary analyses. The mouse ENCODE project aims to make this connection explicitly and with a moderate breadth. Assays identical to those being used in the ENCODE project are performed in cell types in mouse that are similar or homologous to those studied in the human project. Thus we will be able to discover which epigenetic features are conserved between mouse and human, and we can examine the extent to which these overlap with the DNA sequences under negative selection. The contribution of DNA that with a function preserved in mammals versus that with a function in only one species will be discovered. The results will have a significant impact on our understanding of the evolution of gene regulation. Maps of Occupancy by Transcription Factors Maps of occupancy of genomic DNA by transcription factors (TFs) are determined by ChIP-seq. This consists of two basic steps: chromatin immunoprecipitation (ChIP) is used to highly enrich genomic DNA for the segments bound by specific proteins (the antigens recognized by the antibodies) followed by massively parallel short read sequencing to tag the enriched DNA segments. Sequencing is done on the Illumina GAIIx and HiSeq. The sequence tags are mapped back to the mouse genome (Langmead et al. 2009), and a graph of the enrichment for TF binding are displayed as the "Signal" track (essentially the counts of mapped reads per interval) and the deduced probable binding sites from the MACS program (Zhang et al. 2008) are shown in the "Peaks" track. Each experiment is associated with an input signal, which represents the control condition where immunoprecipitation with non-specific immunoglobulin was performed in the same cell type. The sequence reads, quality scores, and alignment coordinates from these experiments are available for download. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf

Project description:Rationale for the Mouse ENCODE project Our knowledge of the function of genomic DNA sequences comes from three basic approaches. Genetics uses changes in behavior or structure of a cell or organism in response to changes in DNA sequence to infer function of the altered sequence. Biochemical approaches monitor states of histone modification, binding of specific transcription factors, accessibility to DNases and other epigenetic features along genomic DNA. In general, these are associated with gene activity, but the precise relationships remain to be established. The third approach is evolutionary, using comparisons among homologous DNA sequences to find segments that are evolving more slowly or more rapidly than expected given the local rate of neutral change. These are inferred to be under negative or positive selection, respectively, and we interpret these as DNA sequences needed for a preserved (negative selection) or adaptive (positive selection) function. The ENCODE project aims to discover all the DNA sequences associated with various epigenetic features, with the reasonable expectation that these will also be functional (best tested by genetic methods). However, it is not clear how to relate these results with those from evolutionary analyses. The mouse ENCODE project aims to make this connection explicitly and with a moderate breadth. Assays identical to those being used in the ENCODE project are performed in cell types in mouse that are similar or homologous to those studied in the human project. Thus we will be able to discover which epigenetic features are conserved between mouse and human, and we can examine the extent to which these overlap with the DNA sequences under negative selection. The contribution of DNA that with a function preserved in mammals versus that with a function in only one species will be discovered. The results will have a significant impact on our understanding of the evolution of gene regulation. Maps of Occupancy by Transcription Factors Maps of occupancy of genomic DNA by transcription factors (TFs) are determined by ChIP-seq. This consists of two basic steps: chromatin immunoprecipitation (ChIP) is used to highly enrich genomic DNA for the segments bound by specific proteins (the antigens recognized by the antibodies) followed by massively parallel short read sequencing to tag the enriched DNA segments. Sequencing is done on the Illumina GAIIx and HiSeq. The sequence tags are mapped back to the mouse genome (Langmead et al. 2009), and a graph of the enrichment for TF binding are displayed as the "Signal" track (essentially the counts of mapped reads per interval) and the deduced probable binding sites from the MACS program (Zhang et al. 2008) are shown in the "Peaks" track. Each experiment is associated with an input signal, which represents the control condition where immunoprecipitation with non-specific immunoglobulin was performed in the same cell type. The sequence reads, quality scores, and alignment coordinates from these experiments are available for download. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf Cells were grown according to the approved ENCODE cell culture protocols (http://hgwdev.cse.ucsc.edu/ENCODE/protocols/cell/mouse). The chromatin immunoprecipitation followed published methods (Welch et al. 2004). Information on antibodies used is available via the hyperlinks in the "Select subtracks" menu. Samples passing initial quality thresholds (enrichment and depletion for positive and negative controls - if available - by quantitative PCR of ChIP material) are processed for library construction for Illumina sequencing, using the ChIP-seq Sample Preparation Kit purchased from Illumina. Starting with a 10 ng sample of ChIP DNA, DNA fragments were repaired to generate blunt ends and a single A nucleotide was added to each end. Double-stranded Illumina adaptors were ligated to the fragments. Ligation products were amplified by 18 cycles of PCR, and the DNA between 250-350 bp was gel purified. Completed libraries were quantified with Quant-iT dsDNA HS Assay Kit. The DNA library was sequenced on the Illumina Genome Analyzer II sequencing system, and more recently on the HiSeq. Cluster generation, linearization, blocking and sequencing primer reagents were provided in the Illumina Cluster Amplification kits. All samples are being determined as biological replicates except time course samples. The data displayed are from the pooled reads for all replicates, but individual replicates are available by download. The resulting 36-nucleotide sequence reads (fastq files) were moved to a data library in Galaxy, and the tools implemented in Galaxy were used for further processing via workflows (Blankenberg et al. 2010). The reads were mapped to the mouse genome (mm9 assembly) using the program bowtie (Langmead et al. 2009), and the files of mapped reads for the ChIP sample and from the "input" control (no antibody) were processed by MACs (Zhang et al. 2008) to call peaks for occupancy by transcription factors, using the parameters mfold=15, bandwidth=125. Per-replicate aligments and sequences are available for download at downloads page (http://hgdownload.cse.ucsc.edu/goldenPath/mm9/encodeDCC/wgEncodePsuTfbs/).

Project description:This data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Ross Hardison mailto:rch8@psu.edu). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:genome@soe.ucsc.edu). Knowledge of the function of genomic DNA sequences comes from three basic approaches. Genetics uses changes in behavior or structure of a cell or organism in response to changes in DNA sequence to infer function of the altered sequence. Biochemical approaches monitor states of histone modification, binding of specific transcription factors, accessibility to DNases and other epigenetic features along genomic DNA. In general, these are associated with gene activity, but the precise relationships remain to be established. The third approach is evolutionary, using comparisons among homologous DNA sequences to find segments that are evolving more slowly or more rapidly than expected given the local rate of neutral change. These are inferred to be under negative or positive selection, respectively, and interpreted as DNA sequences needed for a preserved (negative selection) or adaptive (positive selection) function. The ENCODE project aims to discover all the DNA sequences associated with various epigenetic features, with the reasonable expectation that these will also be functional (best tested by genetic methods). However, it is not clear how to relate these results with those from evolutionary analyses. The mouse ENCODE project aims to make this connection explicitly and with a moderate breadth. Assays identical to those being used in the ENCODE project are performed in cell types in mouse that are similar or homologous to those studied in the human project. Thus we will be able to discover which epigenetic features are conserved between mouse and human, and we can examine the extent to which these overlap with the DNA sequences under negative selection. The contribution of DNA with a function preserved in mammals versus that with a function in only one species will be discovered. One of the epigenetic features most closely related to genomic activity is the production of stable RNA, including transcripts from both protein-coding genes and noncoding transcripts. These genomic compilations of transcripts, or transcriptomes, are primary determinants of the way cells function, respond and differentiate, both by the production of proteins translated from coding transcripts and the regulatory activity of untranslated non-coding transcripts. Non-coding RNA's regulate gene expression through diverse mechanisms ranging from reducing chromatin accessibility (affecting large regions or whole chromosomes) to precise fine-tuning of transcription from specific genes, e.g. via RNAi. Even though a large proportion of mammalian genomes is transcribed, many of the transcribed segments have yet to be assigned any function. The ENCODE project aims to create a comprehensive, quantitative annotation of the human transcriptome in several cell and tissue types as well as to understand regulation of transcriptomes by establishing the relationship between regulatory factors and their targets. Mapping the mouse transcriptome in similar tissues will allow us to discern conservation of transcriptome profiles between mouse and human and to discover species-specific transcription patterns, and to infer conserved versus species-specific regulatory mechanisms. The results will have a significant impact on our understanding of the evolution of gene regulation. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf Cells were grown according to the approved ENCODE cell culture protocols (http://hgwdev.cse.ucsc.edu/ENCODE/protocols/cell/mouse). Total RNA was extracted from 5-10 million cells using TRIzol reagent. This was followed by mRNA selection, fragmentation and cDNA synthesis, which were performed as described previously (Mortazavi et al., 2009). Double-stranded cDNA samples were processed for library construction for Illumina sequencing, using the Illumina ChIP-seq Sample Preparation Kit. Strand-specific libraries were generated in a similar manner, except for a couple of modifications described previously (Parkhomchuk et al., 2009). Briefly, instead of dTTP, dUTP was used during second-strand cDNA synthesis to label the second-strand cDNA. During library preparation, the dUTP-labeled cDNA was treated with Uracil N Glycosylase, prior to the PCR amplification step. This was done to remove uracil from the second-strand, following which the DNA was subjected to high heat to facilitate abasic scission of the second strand. Cluster generation, linearization, blocking and sequencing primer reagents were provided in the Illumina Cluster Amplification kits. All samples are considered as biological replicates. Sequencing was done on the Illumina Genome Analyzer IIx and on the Illumina HiSeq 2000. FastQ files for the resulting sequence reads (single read and paired-end, directional and non-directional) were moved to a data library in Galaxy, and tools implemented in Galaxy were used for further processing via workflows ((Giardine et al., 2005), (Blankenberg et al., 2010 ), (Goecks et al., 2010). Data processing was also performed on the CyberSTAR high-performance computing system at Penn State. The reads were mapped to the mouse genome (mm9 assembly) using the program TopHat ((Langmead et al., 2009) and (Trapnell et al., 2009)). Signal tracks were created using BEDtools ((Quinlan et al., 2010)) and SAMtools ((Li, Handasaker et al., 2009)).

Project description:This data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Ross Hardison mailto:rch8@psu.edu). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:genome@soe.ucsc.edu). Rationale for the Mouse ENCODE project: Knowledge of the function of genomic DNA sequences comes from three basic approaches. Genetics uses changes in behavior or structure of a cell or organism in response to changes in DNA sequence to infer function of the altered sequence. Biochemical approaches monitor states of histone modification, binding of specific transcription factors, accessibility to DNases and other epigenetic features along genomic DNA. In general, these features are associated with gene activity, but the precise relationships remain to be established. The third approach is evolutionary, using comparisons among homologous DNA sequences to find segments that are evolving more slowly or more rapidly than expected given the local rate of neutral change. Such changes are inferred to be under negative or positive selection, respectively, and interpreted as DNA sequences needed for a preserved (negative selection) or adaptive (positive selection) function. The ENCODE project aims to discover all the DNA sequences associated with various epigenetic features, with the reasonable expectation that these will also be functional (best tested by genetic methods). However, it is not clear how to relate these results with those from evolutionary analyses. The mouse ENCODE project aims to make this connection explicitly and with a moderate breadth. Assays identical to those being used in the ENCODE project are performed in cell types in mouse that are similar or homologous to those studied in the human project. The comparison will be used to discover which epigenetic features are conserved between mouse and human, and examine the extent to which these overlap with the DNA sequences under negative selection. The contribution of functional DNA preserved in mammals versus function in only one species will be discovered. The results will have a significant impact on the understanding of the evolution of gene regulation. Maps of DNaseI Sensitivity: DNaseI has long been used to map general chromatin accessibility, and DNaseI hypersensitivity is a universal feature of active cis-regulatory sequences. Maps of DNaseI sensitivity measured genome-wide are generated through DNaseI digestion, addition of linkers at the sites of cleavage, and library prep followed by massively parallel short read sequencing on the Illumina GAIIx and HiSeq platforms. The sequence tags are mapped back to the mouse genome, and a graph of the smoothed kernel density of DNaseI cleavage sites is displayed as the "Signal" track. This provides a quantitative estimate of the frequency of cleavage by DNaseI in the initial digest, which in turn is related to the accessibility of the DNA in the chromatin. Segments of greatest cleavage site density represent DNase hypersensitive sites (DHSs) and are identified as peaks by the F-seq program (Boyle et al. 2008). DHSs are candidates for any cis-regulatory module, including promoters, enhancers, insulators, and novel elements. The sequence reads, quality scores, and alignment coordinates from these experiments are available for download. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf

Project description:This data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Ross Hardison mailto:rch8@psu.edu). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:genome@soe.ucsc.edu). Rationale for the Mouse ENCODE project Our knowledge of the function of genomic DNA sequences comes from three basic approaches. Genetics uses changes in behavior or structure of a cell or organism in response to changes in DNA sequence to infer function of the altered sequence. Biochemical approaches monitor states of histone modification, binding of specific transcription factors, accessibility to DNases and other epigenetic features along genomic DNA. In general, these are associated with gene activity, but the precise relationships remain to be established. The third approach is evolutionary, using comparisons among homologous DNA sequences to find segments that are evolving more slowly or more rapidly than expected given the local rate of neutral change. These are inferred to be under negative or positive selection, respectively, and we interpret these as DNA sequences needed for a preserved (negative selection) or adaptive (positive selection) function. The ENCODE project aims to discover all the DNA sequences associated with various epigenetic features, with the reasonable expectation that these will also be functional (best tested by genetic methods). However, it is not clear how to relate these results with those from evolutionary analyses. The mouse ENCODE project aims to make this connection explicitly and with a moderate breadth. Assays identical to those being used in the ENCODE project are performed in cell types in mouse that are similar or homologous to those studied in the human project. Thus we will be able to discover which epigenetic features are conserved between mouse and human, and we can examine the extent to which these overlap with the DNA sequences under negative selection. The contribution of DNA that with a function preserved in mammals versus that with a function in only one species will be discovered. The results will have a significant impact on our understanding of the evolution of gene regulation. Maps of histone modifications Levels of three histone modifications are being determined. H3K4me1 (monomethylation of lysine 4 of histone H3) is a mark for active chromatin and in the absence of H3K4me3, it is one indicator of an enhancer. H3K4me3 (trimethylation of lysine 4 of histone H3) is highly enriched at active promoters. One repressive (Polycomb) mark, H3K27me3, is associated with some silenced genes. Maps of genomic DNA in chromatin with these histone modifications are generated by ChIP-seq. This consists of two basic steps: chromatin immunoprecipitation (ChIP) is used to highly enrich genomic DNA for the segments bound by specific proteins (the antigens recognized by the antibodies) followed by massively parallel short read sequencing to tag the enriched DNA segments. Sequencing is done on the Illumina GAIIx and HiSeq. The sequence tags are mapped back to the mouse genome (Langmead et al. 2009), and a graph of the enrichment for histone modifications are displayed as the "Signal" track (essentially the counts of mapped reads per interval) and the deduced probable binding sites from the MACS program (Zhang et al. 2008) are shown in the "Peaks" track. Each experiment is associated with an input signal, which represents the control condition where immunoprecipitation with non-specific immunoglobulin was performed in the same cell type. The sequence reads, quality scores, and alignment coordinates from these experiments are available for download. Cells were grown according to the approved ENCODE cell culture protocols. The chromatin immunoprecipitation followed published methods (Welch et al. 2004). Information on antibodies used is available via the hyperlinks in the "Select subtracks" menu. Samples passing initial quality thresholds (enrichment and depletion for positive and negative controls - if available - by quantitative PCR of ChIP material) are processed for library construction for Illumina sequencing, using the ChIP-seq Sample Preparation Kit purchased from Illumina. Starting with a 10 ng sample of ChIP DNA, DNA fragments were repaired to generate blunt ends and a single A nucleotide was added to each end. Double-stranded Illumina adaptors were ligated to the fragments. Ligation products were amplified by 18 cycles of PCR, and the DNA between 250-350 bp was gel purified. Completed libraries were quantified with Quant-iT dsDNA HS Assay Kit. The DNA library was sequenced on the Illumina Genome Analyzer II sequencing system, and more recently on the HiSeq. Cluster generation, linearization, blocking and sequencing primer reagents were provided in the Illumina Cluster Amplification kits. All samples are being determined as biological replicates except time course samples. The data displayed are from the pooled reads for all replicates, but individual replicates are available by download. The resulting 36-nucleotide sequence reads (fastq files) were moved to a data library in Galaxy, and the tools implemented in Galaxy were used for further processing via workflows (Blankenberg et al. 2010). The reads were mapped to the mouse genome (mm9 assembly) using the program bowtie (Langmead et al. 2009), and the files of mapped reads for the ChIP sample and from the "input" control (no antibody) were processed by MACs (Zhang et al. 2008) to call peaks for occupancy by transcription factors, using the parameters mfold=15, bandwidth=125. Because the signal for some histone modifications is not expected to be tightly localized (compared to a transcription factor), peak calling programs may not be appropriate. Thus in addition, we provide wiggle tracks with tag counts for every 10 bp segment. Per-replicate aligments and sequences are available for download at downloads page.

Project description:This data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Ross Hardison mailto:rch8@psu.edu). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:genome@soe.ucsc.edu). Rationale for the Mouse ENCODE project: Knowledge of the function of genomic DNA sequences comes from three basic approaches. Genetics uses changes in behavior or structure of a cell or organism in response to changes in DNA sequence to infer function of the altered sequence. Biochemical approaches monitor states of histone modification, binding of specific transcription factors, accessibility to DNases and other epigenetic features along genomic DNA. In general, these features are associated with gene activity, but the precise relationships remain to be established. The third approach is evolutionary, using comparisons among homologous DNA sequences to find segments that are evolving more slowly or more rapidly than expected given the local rate of neutral change. Such changes are inferred to be under negative or positive selection, respectively, and interpreted as DNA sequences needed for a preserved (negative selection) or adaptive (positive selection) function. The ENCODE project aims to discover all the DNA sequences associated with various epigenetic features, with the reasonable expectation that these will also be functional (best tested by genetic methods). However, it is not clear how to relate these results with those from evolutionary analyses. The mouse ENCODE project aims to make this connection explicitly and with a moderate breadth. Assays identical to those being used in the ENCODE project are performed in cell types in mouse that are similar or homologous to those studied in the human project. The comparison will be used to discover which epigenetic features are conserved between mouse and human, and examine the extent to which these overlap with the DNA sequences under negative selection. The contribution of functional DNA preserved in mammals versus function in only one species will be discovered. The results will have a significant impact on the understanding of the evolution of gene regulation. Maps of DNaseI Sensitivity: DNaseI has long been used to map general chromatin accessibility, and DNaseI hypersensitivity is a universal feature of active cis-regulatory sequences. Maps of DNaseI sensitivity measured genome-wide are generated through DNaseI digestion, addition of linkers at the sites of cleavage, and library prep followed by massively parallel short read sequencing on the Illumina GAIIx and HiSeq platforms. The sequence tags are mapped back to the mouse genome, and a graph of the smoothed kernel density of DNaseI cleavage sites is displayed as the "Signal" track. This provides a quantitative estimate of the frequency of cleavage by DNaseI in the initial digest, which in turn is related to the accessibility of the DNA in the chromatin. Segments of greatest cleavage site density represent DNase hypersensitive sites (DHSs) and are identified as peaks by the F-seq program (Boyle et al. 2008). DHSs are candidates for any cis-regulatory module, including promoters, enhancers, insulators, and novel elements. The sequence reads, quality scores, and alignment coordinates from these experiments are available for download. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf Cells were grown and harvested according to the approved ENCODE cell culture protocols (http://hgwdev.cse.ucsc.edu/ENCODE/protocols/cell/mouse) for G1E and G1E-ER4. DNaseI hypersensitive sites were isolated using methods called DNase-seq or DNase-chip (Song and Crawford, 2010). Briefly, cells were lysed with NP40, and intact nuclei were digested with optimal levels of DNaseI enzyme. DNaseI-digested ends were captured from three different DNase concentrations, and material was sequenced using Illumina sequencing. The read length for sequences from DNase-seq is 20 bases long due to a MmeI cutting step of the approximately 50 kb DNA fragments extracted after DNaseI digestion. Sequences from each experiment were mapped to the mouse genome (mm9 assembly) using the program Bowtie (http://bowtie-bio.sourceforge.net/index.shtml) (Langmead et al., 2009). Reads mapping to more than one location were not removed. For such reads, only the best mapping result was used ("--best" option). Sequences from multiple lanes were combined for a single replicate and converted to the sam/bam format using SAMtools (http://samtools.sourceforge.net/). Using F-seq, the resulting digital signal was converted to a continuous wiggle track that employs a Parzen kernel density estimation to create base pair scores (Boyle et al., 2008). Discrete DNaseI HS sites (peaks) were identified from the DNase-seq F-seq density signal. Significant regions were determined by fitting the data to a gamma distribution to calculate p-values.

Project description:This data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Barbara Wold mailto:woldb@caltech.edu, Georgi K. Marinov mailto:georgi@caltech.edu, Diane Trout mailto:diane@caltech.edu). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:genome@soe.ucsc.edu). Our knowledge of the function of genomic DNA sequences comes from three basic approaches. Genetics uses changes in behavior or structure of a cell or organism in response to changes in DNA sequence to infer function of the altered sequence. Biochemical approaches monitor states of histone modification, binding of specific transcription factors, accessibility to DNases and other epigenetic features along genomic DNA. In general, these are associated with gene activity, but the precise relationships remain to be established. The third approach is evolutionary, using comparisons among homologous DNA sequences to find segments that are evolving more slowly or more rapidly than expected given the local rate of neutral change. These are inferred to be under negative or positive selection, respectively, and we interpret these as DNA sequences needed for a preserved (negative selection) or adaptive (positive selection) function. The ENCODE project aims to discover all the DNA sequences associated with various epigenetic features, with the reasonable expectation that these will also be functional (best tested by genetic methods). However, it is not clear how to relate these results with those from evolutionary analyses. The mouse ENCODE project aims to make this connection explicitly and with a moderate breadth. Assays identical to those being used in the ENCODE project are performed in cell types in mouse that are similar or homologous to those studied in the human project. Thus, we will be able to discover which epigenetic features are conserved between mouse and human, and we can examine the extent to which these overlap with the DNA sequences under negative selection. The contribution of DNA that with a function preserved in mammals versus that with a function in only one species will be discovered. The results will have a significant impact on our understanding of the evolution of gene regulation. Maps of Occupancy by Transcription Factors Genome-wide occupancy maps of transcription factors (TFs) are generated by ChIP-seq. A ChIP-Seq experiment combines a chromatin immunoprecipitation (ChIP) experiment that enriches genomic DNA for the segments bound by specific proteins (the antigens recognized by the antibody) with high-throughput short read sequencing of the enriched DNA fragments (Wold & Myers, 2008). Proteins are crosslinked to DNA (usually with formaldehyde), chromatin is sheared and immunoprecipitated with the antibody of interest. The immunoprecipitated material is turned into a sequencing library and sequenced. The sequencing reads are then aligned to the genome. A control sample consisting of sonicated chromatin that has not been immunoprecipitated or immunoprecipitated with a non-specific immunoglobulin is also sequenced. The ChIP and the control datasets are analyzed with a variety of software packages to identify regions occupied by the target protein. The sequencing data, alignments and analysis files for these experiments are available for download. In specific, the Ren lab examined RNA polymerase II (PolII), co-activator protein p300, the insulator protein CTCF, and two chromatin modification marks, H3K4me3 and H3K4me1, due to their demonstrated utilities in identifying promoters, enhancers and insulator elements (Barski et al., 2007; Blow et al., 2010; Heintzman et al., 2009; Kim et al., 2007; Kim et al., 2005a; Visel et al., 2009). Enrichment of H3K4me3 or PolII signals is a strong indicator of an active promoter, while the presence of p300 or H3K4me1 outside of promoter regions has been used as a mark for enhancers. CTCF binding sites are considered as a mark for potential insulator elements. For each transcription factor or chromatin mark in each tissue, ChIP-seq was carried out with at least two biological replicates. Each experiment produced 20-30 million monoclonal, uniquely mapped tags. Our knowledge of the function of genomic DNA sequences comes from three basic approaches. Genetics uses changes in behavior or structure of a cell or organism in response to changes in DNA sequence to infer function of the altered sequence. Biochemical approaches monitor states of histone modification, binding of specific transcription factors, accessibility to DNases and other epigenetic features along genomic DNA. In general, these are associated with gene activity, but the precise relationships remain to be established. The third approach is evolutionary, using comparisons among homologous DNA sequences to find segments that are evolving more slowly or more rapidly than expected given the local rate of neutral change. These are inferred to be under negative or positive selection, respectively, and we interpret these as DNA sequences needed for a preserved (negative selection) or adaptive (positive selection) function. The ENCODE project aims to discover all the DNA sequences associated with various epigenetic features, with the reasonable expectation that these will also be functional (best tested by genetic methods). However, it is not clear how to relate these results with those from evolutionary analyses. The mouse ENCODE project aims to make this connection explicitly and with a moderate breadth. Assays identical to those being used in the ENCODE project are performed in cell types in mouse that are similar or homologous to those studied in the human project. Thus we will be able to discover which epigenetic features are conserved between mouse and human, and we can examine the extent to which these overlap with the DNA sequences under negative selection. The contribution of DNA that with a function preserved in mammals versus that with a function in only one species will be discovered. The results will have a significant impact on our understanding of the evolution of gene regulation. Maps of histone modifications Levels of three histone modifications are being determined. H3K4me1 (monomethylation of lysine 4 of histone H3) is a mark for active chromatin and in the absence of H3K4me3, it is one indicator of an enhancer. H3K4me3 (trimethylation of lysine 4 of histone H3) is highly enriched at active promoters. One repressive (Polycomb) mark, H3K27me3, is associated with some silenced genes. Maps of genomic DNA in chromatin with these histone modifications are generated by ChIP-seq. This consists of two basic steps: chromatin immunoprecipitation (ChIP) is used to highly enrich genomic DNA for the segments bound by specific proteins (the antigens recognized by the antibodies) followed by massively parallel short read sequencing to tag the enriched DNA segments. Sequencing is done on the Illumina GAIIx and HiSeq. The sequence tags are mapped back to the mouse genome (Langmead et al. 2009), and a graph of the enrichment for histone modifications are displayed as the "Signal" track (essentially the counts of mapped reads per interval) and the deduced probable binding sites from the MACS program (Zhang et al. 2008) are shown in the "Peaks" track. Each experiment is associated with an input signal, which represents the control condition where immunoprecipitation with non-specific immunoglobulin was performed in the same cell type. The sequence reads, quality scores, and alignment coordinates from these experiments are available for download. Our knowledge of the function of genomic DNA sequences comes from three basic approaches. Genetics uses changes in behavior or structure of a cell or organism in response to changes in DNA sequence to infer function of the altered sequence. Biochemical approaches monitor states of histone modification, binding of specific transcription factors, accessibility to DNases and other epigenetic features along genomic DNA. In general, these are associated with gene activity, but the precise relationships remain to be established. The third approach is evolutionary, using comparisons among homologous DNA sequences to find segments that are evolving more slowly or more rapidly than expected given the local rate of neutral change. These are inferred to be under negative or positive selection, respectively, and we interpret these as DNA sequences needed for a preserved (negative selection) or adaptive (positive selection) function. The ENCODE project aims to discover all the DNA sequences associated with various epigenetic features, with the reasonable expectation that these will also be functional (best tested by genetic methods). However, it is not clear how to relate these results with those from evolutionary analyses. The mouse ENCODE project aims to make this connection explicitly and with a moderate breadth. Assays identical to those being used in the ENCODE project are performed in cell types in mouse that are similar or homologous to those studied in the human project. Thus we will be able to discover which epigenetic features are conserved between mouse and human, and we can examine the extent to which these overlap with the DNA sequences under negative selection. The contribution of DNA that with a function preserved in mammals versus that with a function in only one species will be discovered. The results will have a significant impact on our understanding of the evolution of gene regulation. Maps of Occupancy by Transcription Factors Maps of occupancy of genomic DNA by transcription factors (TFs) are determined by ChIP-seq. This consists of two basic steps: chromatin immunoprecipitation (ChIP) is used to highly enrich genomic DNA for the segments bound by specific proteins (the antigens recognized by the antibodies) followed by massively parallel short read sequencing to tag the enriched DNA segments. Sequencing is done on the Illumina GAIIx and HiSeq. The sequence tags are mapped back to the mouse genome (Langmead et al. 2009), and a graph of the enrichment for TF binding are displayed as the "Signal" track (essentially the counts of mapped reads per interval) and the deduced probable binding sites from the MACS program (Zhang et al. 2008) are shown in the "Peaks" track. Each experiment is associated with an input signal, which represents the control condition where immunoprecipitation with non-specific immunoglobulin was performed in the same cell type. The sequence reads, quality scores, and alignment coordinates from these experiments are available for download. The sequence reads, quality scores, and alignment coordinates from these experiments are available for download. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf Cells were grown according to the approved ENCODE cell culture protocols (http://genome-test.cse.ucsc.edu/ENCODE/protocols/cell/mouse). Chromatin immunoprecipitation followed published methods (Johnson & Mortazavi et al., 2007) with the exception of certain experiments for which glutaraldehyde was added to the crosslink reaction. Information on the antibodies used is available via the metadata for each subtrack. Libraries were constructed using the Illumina ChIP-seq Sample Preparation Kit or using a modified protocol that includes the addition of multiplexing tags to the fragments. DNA fragments were repaired to generate blunt ends and a single A nucleotide was added to each end. Double-stranded Illumina adaptors or Double-stranded Illumina adaptors with multiplexing tags were ligated to the fragments. Ligation products were amplified by 18 cycles of PCR, and the DNA between 150-250 bp was gel purified. Completed libraries were quantified with Quant-iT dsDNA HS Assay Kit. The DNA library was sequenced on the Illumina GAII and GAIIx sequencing systems, and more recently, for multiplexed libraries, several of them were pooled and sequenced on the HiSeq platform. Cluster generation, linearization, blocking and sequencing primer reagents were provided in the Illumina Cluster Amplification kits. Older libraries were generated using 2 rounds of PCR. Matched input samples were sequenced for each variation of fixation conditions and the number of PCR rounds. Reads of 32 bp, 36 bp or 50 bp length were generated. Sequencing reads (fastq files) were assigned to the corresponding libraries based on the multiplexing tag for pooled libraries (all tags have been removed from reads in the fastq files available for download) or directly processed. Bowtie (Langmead et al., 2009) was used to map reads to the male or female version of the mouse genome (excluding the _random chromosomes in the assembly) depending on the cell line sex. The following parameters were used: "-v 2 -k 11 -m 10 -t --best --strata". Aligned reads were converted into rds files using the ERANGE package (Johnson & Mortazavi et al., 2007) and the findall.py program in ERANGE was used to identify enriched regions against the matching input sample. The following settings were used for point-source transcription factors: "--shift learn --ratio 3 --minimum 2 --listPeak --revbackground". For histone modifications, the settings were changed to "--notrim --nodirectionality --spacing 100 --ratio 3 --minimum 2 --listPeak --revbackground". Cells were grown according to the approved ENCODE cell culture protocols (http://genome.ucsc.edu/ENCODE/protocols/cell/mouse). RNA-Seq RNA samples from tissues and primary cells were extracted from Trizol® according to protocol (Invitrogen). PolyA+ RNA was purified with the Dynabeads mRNA purification kit (Invitrogen). The mRNA libraries were prepared for strand-specific sequencing as described previously (Parkhomchuk et al., 2009). Sequencing and Analysis Samples were sequenced on Illumina Genome Analyzer II, Genome Analyzer IIx and HiSeq 2000 platforms for 36 cycles. Image analysis, base calling and alignment to the mouse genome version mm9 were performed using Illumina's RTA. Alignment to the mouse genome was performed using TopHat (Trapnell et al., 2009). Wig files were generated by TopHat and expression levels were calculated with Cufflinks (Trapnell et al., 2010). Cells were grown according to the approved ENCODE cell culture protocols (http://genome.ucsc.edu/ENCODE/protocols/cell/mouse). Enrichment and Library Preparation Chromatin immunoprecipitation was performed according to Ren Lab ChIP Protocol (http://bioinformatics-renlab.ucsd.edu/RenLabChipProtocolV1.pdf). Library construction was performed according to Ren Lab Library Protocol (http://bioinformatics-renlab.ucsd.edu/RenLabLibraryProtocolV1.pdf). Sequencing and Analysis Samples were sequenced on Illumina Genome Analyzer II, Genome Analyzer IIx and HiSeq 2000 platforms for 36 cycles. Image analysis, base calling and alignment to the mouse genome version mm9 were performed using Illumina's RTA and Genome Analyzer Pipeline software. Alignment to the mouse genome was performed using ELAND or Bowtie (Langmead et al., 2009) with a seed length of 25 and allowing up to two mismatches. Only the sequences that mapped to one location were used for further analysis. Of those sequences, clonal reads, defined as having the same start position on the same strand, were discarded. BED and wig files were created using custom perl scripts. Cells were grown according to the approved ENCODE cell culture protocols. The chromatin immunoprecipitation followed published methods (Welch et al. 2004). Information on antibodies used is available via the hyperlinks in the "Select subtracks" menu. Samples passing initial quality thresholds (enrichment and depletion for positive and negative controls - if available - by quantitative PCR of ChIP material) are processed for library construction for Illumina sequencing, using the ChIP-seq Sample Preparation Kit purchased from Illumina. Starting with a 10 ng sample of ChIP DNA, DNA fragments were repaired to generate blunt ends and a single A nucleotide was added to each end. Double-stranded Illumina adaptors were ligated to the fragments. Ligation products were amplified by 18 cycles of PCR, and the DNA between 250-350 bp was gel purified. Completed libraries were quantified with Quant-iT dsDNA HS Assay Kit. The DNA library was sequenced on the Illumina Genome Analyzer II sequencing system, and more recently on the HiSeq. Cluster generation, linearization, blocking and sequencing primer reagents were provided in the Illumina Cluster Amplification kits. All samples are being determined as biological replicates except time course samples. The data displayed are from the pooled reads for all replicates, but individual replicates are available by download. The resulting 36-nucleotide sequence reads (fastq files) were moved to a data library in Galaxy, and the tools implemented in Galaxy were used for further processing via workflows (Blankenberg et al. 2010). The reads were mapped to the mouse genome (mm9 assembly) using the program bowtie (Langmead et al. 2009), and the files of mapped reads for the ChIP sample and from the "input" control (no antibody) were processed by MACs (Zhang et al. 2008) to call peaks for occupancy by transcription factors, using the parameters mfold=15, bandwidth=125. Because the signal for some histone modifications is not expected to be tightly localized (compared to a transcription factor), peak calling programs may not be appropriate. Thus in addition, we provide wiggle tracks with tag counts for every 10 bp segment. Per-replicate aligments and sequences are available for download at downloads page. Cells were grown according to the approved ENCODE cell culture protocols (http://hgwdev.cse.ucsc.edu/ENCODE/protocols/cell/mouse). The chromatin immunoprecipitation followed published methods (Welch et al. 2004). Information on antibodies used is available via the hyperlinks in the "Select subtracks" menu. Samples passing initial quality thresholds (enrichment and depletion for positive and negative controls - if available - by quantitative PCR of ChIP material) are processed for library construction for Illumina sequencing, using the ChIP-seq Sample Preparation Kit purchased from Illumina. Starting with a 10 ng sample of ChIP DNA, DNA fragments were repaired to generate blunt ends and a single A nucleotide was added to each end. Double-stranded Illumina adaptors were ligated to the fragments. Ligation products were amplified by 18 cycles of PCR, and the DNA between 250-350 bp was gel purified. Completed libraries were quantified with Quant-iT dsDNA HS Assay Kit. The DNA library was sequenced on the Illumina Genome Analyzer II sequencing system, and more recently on the HiSeq. Cluster generation, linearization, blocking and sequencing primer reagents were provided in the Illumina Cluster Amplification kits. All samples are being determined as biological replicates except time course samples. The data displayed are from the pooled reads for all replicates, but individual replicates are available by download. The resulting 36-nucleotide sequence reads (fastq files) were moved to a data library in Galaxy, and the tools implemented in Galaxy were used for further processing via workflows (Blankenberg et al. 2010). The reads were mapped to the mouse genome (mm9 assembly) using the program bowtie (Langmead et al. 2009), and the files of mapped reads for the ChIP sample and from the "input" control (no antibody) were processed by MACs (Zhang et al. 2008) to call peaks for occupancy by transcription factors, using the parameters mfold=15, bandwidth=125. Per-replicate aligments and sequences are available for download at downloads page (http://hgdownload.cse.ucsc.edu/goldenPath/mm9/encodeDCC/wgEncodePsuTfbs/). Cells were grown according to the approved ENCODE cell culture protocols (http://hgwdev.cse.ucsc.edu/ENCODE/protocols/cell/mouse). For details on the chromatin immunoprecipitation protocol used, see Euskirchen et. al., (2007), Rozowsky et. al. (2009) and Auerbach et. al. (2009). DNA recovered from the precipitated chromatin was sequenced on the Illumina (Solexa) sequencing platform and mapped to the genome using the Eland alignment program. ChIP-seq data was scored based on sequence reads (length ~30 bps) that align uniquely to the human genome. From the mapped tags, a signal map of ChIP DNA fragments (average fragment length ~ 200 bp) was constructed where the signal height is the number of overlapping fragments at each nucleotide position in the genome. Reads were pooled from all submitted replicates to generate the Peak and Signal files. Per-replicate aligments and sequences are available for download at downloads page (http://hgdownload.cse.ucsc.edu/goldenPath/mm9/encodeDCC/wgEncodeSydhTfbs/). For each 1 Mb segment of each chromosome, a peak height threshold was determined by requiring a false discovery rate <= 0.01 when comparing the number of peaks above said threshold to the number of peaks obtained from multiple simulations of a random null background with the same number of mapped reads (also accounting for the fraction of mapable bases for sequence tags in that 1 Mb segment). The number of mapped tags in a putative binding region is compared to the normalized (normalized by correlating tag counts in genomic 10 kb windows) number of mapped tags in the same region from an input DNA control. Using a binomial test, only regions that have a p-value = 0.01 are considered to be significantly enriched compared to the input DNA control. Cells were grown according to the approved ENCODE cell culture protocols (http://hgwdev.cse.ucsc.edu/ENCODE/protocols/cell/mouse). Fresh tissues were harvested from mice and the nuclei prepared according to the tissue appropriate protocol (http://hgwdev.cse.ucsc.edu/ENCODE/protocols/cell/mouse). Digital DNaseI was performed by DNaseI digestion of intact nuclei, isolating DNaseI 'double-hit' fragments as described in Sabo et al. (2006), and direct sequencing of fragment ends (which correspond to in vivo DNaseI cleavage sites) using the Illumina IIx (and Illumina HiSeq by early 2011) platform (36 bp reads). Uniquely mapping high-quality reads were mapped to the genome using the bowtie aligner. DNaseI sensitivity is directly reflected in raw tag density, which is shown in the track as density of tags mapping within a 150 bp sliding window (at a 20 bp step across the genome). DNaseI sensitive zones (HotSpots) were identified using the HotSpot algorithm described in Sabo et al. (2004). 1.0% false discovery rate thresholds (FDR 0.01) were computed for each cell type by applying the HotSpot algorithm to an equivalent number of random uniquely mapping 36mers. DNaseI hypersensitive sites (DHSs or Peaks) were identified as signal peaks within FDR 1.0% hypersensitive zones using a peak-finding algorithm (I-max). Cells were grown according to the approved ENCODE cell culture protocols (http://hgwdev.cse.ucsc.edu/ENCODE/protocols/cell/mouse). Fresh tissues were harvested from mice and the nuclei prepared according to the tissue appropriate protocol (http://hgwdev.cse.ucsc.edu/ENCODE/protocols/cell/mouse). Reads were aligned to mm9 reference using ABI BioScope software version 1.2.1. Colorspace FASTQ format files were created using Heng Li's solid2fastq.pl script version 0.1.4, representing 0,1,2,3 color codes with the letters A,C,G,T respectively. Signal files were created from the BAM alignments using BEDTools.

Project description:This data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Barbara Wold mailto:woldb@caltech.edu, Georgi K. Marinov mailto:georgi@caltech.edu, Diane Trout mailto:diane@caltech.edu). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:genome@soe.ucsc.edu). Rationale for the Mouse ENCODE project Our knowledge of the function of genomic DNA sequences comes from three basic approaches. Genetics uses changes in behavior or structure of a cell or organism in response to changes in DNA sequence to infer function of the altered sequence. Biochemical approaches monitor states of histone modification, binding of specific transcription factors, accessibility to DNases and other epigenetic features along genomic DNA. In general, these are associated with gene activity, but the precise relationships remain to be established. The third approach is evolutionary, using comparisons among homologous DNA sequences to find segments that are evolving more slowly or more rapidly than expected given the local rate of neutral change. These are inferred to be under negative or positive selection, respectively, and we interpret these as DNA sequences needed for a preserved (negative selection) or adaptive (positive selection) function. The ENCODE project aims to discover all the DNA sequences associated with various epigenetic features, with the reasonable expectation that these will also be functional (best tested by genetic methods). However, it is not clear how to relate these results with those from evolutionary analyses. The mouse ENCODE project aims to make this connection explicitly and with a moderate breadth. Assays identical to those being used in the ENCODE project are performed in cell types in mouse that are similar or homologous to those studied in the human project. Thus, we will be able to discover which epigenetic features are conserved between mouse and human, and we can examine the extent to which these overlap with the DNA sequences under negative selection. The contribution of DNA that with a function preserved in mammals versus that with a function in only one species will be discovered. The results will have a significant impact on our understanding of the evolution of gene regulation. Maps of Occupancy by Transcription Factors Genome-wide occupancy maps of transcription factors (TFs) are generated by ChIP-seq. A ChIP-Seq experiment combines a chromatin immunoprecipitation (ChIP) experiment that enriches genomic DNA for the segments bound by specific proteins (the antigens recognized by the antibody) with high-throughput short read sequencing of the enriched DNA fragments (Wold & Myers, 2008). Proteins are crosslinked to DNA (usually with formaldehyde), chromatin is sheared and immunoprecipitated with the antibody of interest. The immunoprecipitated material is turned into a sequencing library and sequenced. The sequencing reads are then aligned to the genome. A control sample consisting of sonicated chromatin that has not been immunoprecipitated or immunoprecipitated with a non-specific immunoglobulin is also sequenced. The ChIP and the control datasets are analyzed with a variety of software packages to identify regions occupied by the target protein. The sequencing data, alignments and analysis files for these experiments are available for download. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf

Project description:This data was generated by ENCODE. If you have questions about the data, contact the submitting laboratory directly (Ross Hardison mailto:rch8@psu.edu). If you have questions about the Genome Browser track associated with this data, contact ENCODE (mailto:genome@soe.ucsc.edu). Knowledge of the function of genomic DNA sequences comes from three basic approaches. Genetics uses changes in behavior or structure of a cell or organism in response to changes in DNA sequence to infer function of the altered sequence. Biochemical approaches monitor states of histone modification, binding of specific transcription factors, accessibility to DNases and other epigenetic features along genomic DNA. In general, these are associated with gene activity, but the precise relationships remain to be established. The third approach is evolutionary, using comparisons among homologous DNA sequences to find segments that are evolving more slowly or more rapidly than expected given the local rate of neutral change. These are inferred to be under negative or positive selection, respectively, and interpreted as DNA sequences needed for a preserved (negative selection) or adaptive (positive selection) function. The ENCODE project aims to discover all the DNA sequences associated with various epigenetic features, with the reasonable expectation that these will also be functional (best tested by genetic methods). However, it is not clear how to relate these results with those from evolutionary analyses. The mouse ENCODE project aims to make this connection explicitly and with a moderate breadth. Assays identical to those being used in the ENCODE project are performed in cell types in mouse that are similar or homologous to those studied in the human project. Thus we will be able to discover which epigenetic features are conserved between mouse and human, and we can examine the extent to which these overlap with the DNA sequences under negative selection. The contribution of DNA with a function preserved in mammals versus that with a function in only one species will be discovered. One of the epigenetic features most closely related to genomic activity is the production of stable RNA, including transcripts from both protein-coding genes and noncoding transcripts. These genomic compilations of transcripts, or transcriptomes, are primary determinants of the way cells function, respond and differentiate, both by the production of proteins translated from coding transcripts and the regulatory activity of untranslated non-coding transcripts. Non-coding RNA's regulate gene expression through diverse mechanisms ranging from reducing chromatin accessibility (affecting large regions or whole chromosomes) to precise fine-tuning of transcription from specific genes, e.g. via RNAi. Even though a large proportion of mammalian genomes is transcribed, many of the transcribed segments have yet to be assigned any function. The ENCODE project aims to create a comprehensive, quantitative annotation of the human transcriptome in several cell and tissue types as well as to understand regulation of transcriptomes by establishing the relationship between regulatory factors and their targets. Mapping the mouse transcriptome in similar tissues will allow us to discern conservation of transcriptome profiles between mouse and human and to discover species-specific transcription patterns, and to infer conserved versus species-specific regulatory mechanisms. The results will have a significant impact on our understanding of the evolution of gene regulation. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf

Dataset Information

Histone Modifications by ChIP-seq from ENCODE/Caltech

Dataset's files

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets