Project description:Temporal dynamics and mechanisms underlying epigenetic changes in Huntington’s disease (HD),a neurodegenerative disease primarily affecting the striatum, remain unclear. Using slow progressing HDknockinmice,we have generated chromatin immunoprecipitation coupled with sequencing data for RNA polymerase II and histone modifications associated with active enhancers (H3K27ac) and repressive chromatin (H3K27me3), from neuronal, non-neuronal and bulk striatal tissue at two early disease stages. Data integration with cell type-specific transcriptomic databases shows that the HD mutation early accelerates age-related reprogramming of neuronal and glial cell identities at both epigenetic and transcriptional levels. Circular chromosome conformation capture followed by sequencing data using HD mouse striatum showed alterations both at neuronal super-enhancer and CAG expanded disease loci. Using these data to model higher-order chromatin architecture indicated that HD CAG expansion mutation impairs chromatin insulation and gene regulation. Thus, both age-dependent and disease locus-specific mechanisms contribute to early remodelling of chromatin structure in HD striatum.
Project description:4C-Seq has proven to be a powerful technique to identify genome-wide interactions with a single locus of interest (or "bait") that can be important for gene regulation. However, analysis of 4C-Seq data is complicated by the many biases inherent to the technique. An important consideration when dealing with 4C-Seq data is the differences in resolution of signal across the genome that result from differences in 3D distance separation from the bait. This leads to the highest signal in the region immediately surrounding the bait and increasingly lower signals in far-cis and trans. Another important aspect of 4C-Seq experiments is the resolution, which is greatly influenced by the choice of restriction enzyme and the frequency at which it can cut the genome. Thus, it is important that a 4C-Seq analysis method is flexible enough to analyze data generated using different enzymes and to identify interactions across the entire genome. Current methods for 4C-Seq analysis only identify interactions in regions near the bait or in regions located in far-cis and trans, but no method comprehensively analyzes 4C signals of different length scales. In addition, some methods also fail in experiments where chromatin fragments are generated using frequent cutter restriction enzymes. Here, we describe 4C-ker, a Hidden-Markov Model based pipeline that identifies regions throughout the genome that interact with the 4C bait locus. In addition, we incorporate methods for the identification of differential interactions in multiple 4C-seq datasets collected from different genotypes or experimental conditions. Adaptive window sizes are used to correct for differences in signal coverage in near-bait regions, far-cis and trans chromosomes. Using several datasets, we demonstrate that 4C-ker outperforms all existing 4C-Seq pipelines in its ability to reproducibly identify interaction domains at all genomic ranges with different resolution enzymes. 4C-Seq experiments from Igh and Cd83 bait in activated B cells and Tcrb (Eb) bait in double negative T cells and immature B cells. RNA-Seq and ATAC-Seq experiments in DN and Immature B cells.
Project description:4C-Seq has proven to be a powerful technique to identify genome-wide interactions with a single locus of interest (or "bait") that can be important for gene regulation. However, analysis of 4C-Seq data is complicated by the many biases inherent to the technique. An important consideration when dealing with 4C-Seq data is the differences in resolution of signal across the genome that result from differences in 3D distance separation from the bait. This leads to the highest signal in the region immediately surrounding the bait and increasingly lower signals in far-cis and trans. Another important aspect of 4C-Seq experiments is the resolution, which is greatly influenced by the choice of restriction enzyme and the frequency at which it can cut the genome. Thus, it is important that a 4C-Seq analysis method is flexible enough to analyze data generated using different enzymes and to identify interactions across the entire genome. Current methods for 4C-Seq analysis only identify interactions in regions near the bait or in regions located in far-cis and trans, but no method comprehensively analyzes 4C signals of different length scales. In addition, some methods also fail in experiments where chromatin fragments are generated using frequent cutter restriction enzymes. Here, we describe 4C-ker, a Hidden-Markov Model based pipeline that identifies regions throughout the genome that interact with the 4C bait locus. In addition, we incorporate methods for the identification of differential interactions in multiple 4C-seq datasets collected from different genotypes or experimental conditions. Adaptive window sizes are used to correct for differences in signal coverage in near-bait regions, far-cis and trans chromosomes. Using several datasets, we demonstrate that 4C-ker outperforms all existing 4C-Seq pipelines in its ability to reproducibly identify interaction domains at all genomic ranges with different resolution enzymes.
Project description:By 4C-seq protocol we investigated DNA contacts across the genome by the FLC gene in the model plant Arabidopsis thaliana in order to explore a potential role of long-distance chromosomal interactions in the regulation of flowering.
Project description:The ability to correlate chromosome conformation and gene expression gives a great deal of information regarding the strategies used by a cell to properly regulate gene activity. 4C-seq is a relatively new and increasingly popular technology where the set of genomic interactions generated by a single point in the genome can be determined. 4C-seq experiments generate large, complicated datasets and it is imperative that signal is properly distinguished from noise. Currently there are a limited number of methods for analyzing 4C-seq data. Here, we present a new method, fourSig, which, in addition to being simple to use and as precise as current methods, also includes a new feature to prioritize significantly enriched interactions and predict their reproducibility among experimental replicates. Here, we demonstrate the efficacy of fourSig with previously published and novel 4C-seq datasets and show that our significance prioritization correlates with the ability to reproducibly detect interactions amongst replicates. The datasets provided include those generated from allele-specific 4C-Seq with a viewpoint of the TSS for the gene Ibtk on mouse Chromosome 9. FASTQ files, text files containing genomic coordiantes and read counts, and bedGraph formats for UCSC Genome Browser tracks are provided. All sequences were mapped relative to mouse genome build mm9. Sequencing of circular chromosome conformation capture (4C-Seq) was performed at the transcription start site (TSS) for the gene Ibtk for three replicates in F1 hybrid mouse trophoblast stem (TS) cells. Experiment was designed to detect allele specific patterns using SNP differences between the inbred lines mated to produce the TS cells (C57Bl/6 and CAST/EiJ)
Project description:We performed DNA sequencing of potential biallelic SNPs in HD-B and DM1-A patient cell lines. These potential biallelic SNPs were identified in the 4C-seq interaction data. We selected a subset of these SNPs for confirmation by PCR, so we amplified the genomic regions that contained these potential SNPs and performed 2 x 150 bp paired-end sequencing on Illumina MiSeq nano.
Project description:With its capacity for high-resolution data output in one region of interest, chromosome conformation capture combined with high-throughput sequencing (4C-seq) is a state-of-the-art next-generation sequencing technique that provides epigenetic insights, and regularly advances current medical research. However, 4C-seq data is complex and prone to biases, and while specialized programs exist, an unbiased, extensive benchmarking is still lacking. Furthermore, neither substantial datasets with fully characterized ground truth, nor simulation programs for realistic 4C-seq data have been published. We conducted a benchmarking study on 54 4C-seq samples from 12 datasets, including original murine BMM, T-cell, and 416B data, and developed a novel 4C-seq simulation software to allow for more detailed comparisons of 4C-seq algorithms on 50 simulated datasets with 10 to 120 samples each.