Project description:Regulatory DNA elements can control expression of distant genes via physical interactions. Here, we present a cost-effective methodology and computational analysis pipeline for robust characterization of the physical organization around selected promoters and other functional elements using Chromosome Conformation Capture combined with high-throughput sequencing (4C-seq) data. Our approach can be multiplexed and routinely integrated with other functional genomics assays to facilitate physical characterization of gene regulation. A high resolution 4C-seq protocol involving two restriction digests and a revised analysis pipeline was applied to several viewpoints in four genomic loci (the well-characterized alpha-globin and beta-globin loci, and the novel Oct4 and Satb1 loci), allowing robust detection of physical interactions between regulatory DNA elements.
Project description:4C-Seq has proven to be a powerful technique to identify genome-wide interactions with a single locus of interest (or "bait") that can be important for gene regulation. However, analysis of 4C-Seq data is complicated by the many biases inherent to the technique. An important consideration when dealing with 4C-Seq data is the differences in resolution of signal across the genome that result from differences in 3D distance separation from the bait. This leads to the highest signal in the region immediately surrounding the bait and increasingly lower signals in far-cis and trans. Another important aspect of 4C-Seq experiments is the resolution, which is greatly influenced by the choice of restriction enzyme and the frequency at which it can cut the genome. Thus, it is important that a 4C-Seq analysis method is flexible enough to analyze data generated using different enzymes and to identify interactions across the entire genome. Current methods for 4C-Seq analysis only identify interactions in regions near the bait or in regions located in far-cis and trans, but no method comprehensively analyzes 4C signals of different length scales. In addition, some methods also fail in experiments where chromatin fragments are generated using frequent cutter restriction enzymes. Here, we describe 4C-ker, a Hidden-Markov Model based pipeline that identifies regions throughout the genome that interact with the 4C bait locus. In addition, we incorporate methods for the identification of differential interactions in multiple 4C-seq datasets collected from different genotypes or experimental conditions. Adaptive window sizes are used to correct for differences in signal coverage in near-bait regions, far-cis and trans chromosomes. Using several datasets, we demonstrate that 4C-ker outperforms all existing 4C-Seq pipelines in its ability to reproducibly identify interaction domains at all genomic ranges with different resolution enzymes.
Project description:Regulatory DNA elements can control expression of distant genes via physical interactions. Here, we present a cost-effective methodology and computational analysis pipeline for robust characterization of the physical organization around selected promoters and other functional elements using Chromosome Conformation Capture combined with high-throughput sequencing (4C-seq) data. Our approach can be multiplexed and routinely integrated with other functional genomics assays to facilitate physical characterization of gene regulation. Overall design: A high resolution 4C-seq protocol involving two restriction digests and a revised analysis pipeline was applied to several viewpoints in four genomic loci (the well-characterized alpha-globin and beta-globin loci, and the novel Oct4 and Satb1 loci), allowing robust detection of physical interactions between regulatory DNA elements.
Project description:Genome-wide association studies (GWAS) have revealed many susceptibility loci for complex genetic diseases. For most loci, the causal genes have not been identified. Currently, the identification of candidate genes is predominantly based on genes that localize close to or within identified loci. We have recently shown that 92 of the 163 inflammatory bowel disease (IBD)-loci co-localize with non-coding DNA regulatory elements (DREs). Mutations in DREs can contribute to IBD pathogenesis through dysregulation of gene expression. Consequently, genes that are regulated by these 92 DREs are to be considered as candidate genes. This study uses circular chromosome conformation capture-sequencing (4C-seq) to systematically analyze chromatin-interactions at IBD susceptibility loci that localize to regulatory DNA.Using 4C-seq, we identify genomic regions that physically interact with the 92 DRE that were found at IBD susceptibility loci. Since the activity of regulatory elements is cell-type specific, 4C-seq was performed in monocytes, lymphocytes, and intestinal epithelial cells. Altogether, we identified 902 novel IBD candidate genes. These include genes specific for IBD-subtypes and many noteworthy genes including ATG9A and IL10RA. We show that expression of many novel candidate genes is genotype-dependent and that these genes are upregulated during intestinal inflammation in IBD. Furthermore, we identify HNF4? as a potential key upstream regulator of IBD candidate genes.We reveal many novel and relevant IBD candidate genes, pathways, and regulators. Our approach complements classical candidate gene identification, links novel genes to IBD and can be applied to any existing GWAS data.
Project description:The ability to correlate chromosome conformation and gene expression gives a great deal of information regarding the strategies used by a cell to properly regulate gene activity. 4C-Seq is a relatively new and increasingly popular technology where the set of genomic interactions generated by a single point in the genome can be determined. 4C-Seq experiments generate large, complicated data sets and it is imperative that signal is properly distinguished from noise. Currently, there are a limited number of methods for analyzing 4C-Seq data. Here, we present a new method, fourSig, which in addition to being precise and simple to use also includes a new feature that prioritizes detected interactions. Our results demonstrate the efficacy of fourSig with previously published and novel 4C-Seq data sets and show that our significance prioritization correlates with the ability to reproducibly detect interactions among replicates.
Project description:MOTIVATION:With its capacity for high-resolution data output in one region of interest, chromosome conformation capture combined with high-throughput sequencing (4C-seq) is a state-of-the-art next-generation sequencing technique that provides epigenetic insights, and regularly advances current medical research. However, 4C-seq data are complex and prone to biases, and while specialized programs exist, an unbiased, extensive benchmarking is still lacking. Furthermore, neither substantial datasets with fully characterized ground truth, nor simulation programs for realistic 4C-seq data have been published. RESULTS:We conducted a benchmarking study on 66 4C-seq samples from 20 datasets, and developed a novel 4C-seq simulation software, Basic4CSim, to allow for detailed comparisons of 4C-seq algorithms on 50 simulated datasets with 10-120 samples each. Simulations and benchmarking were adapted to address different characteristics of 4C-seq data. Simulated data were compared with published samples to validate simulation settings. We identified differences between 4C-seq algorithms in terms of precision, recall, interaction structure, and run time, and observed general trends. Novel differential pipeline versions of single-sample based 4C-seq algorithms were included in the benchmarking. While no single tool was optimally suited for both near-cis and far-cis, and both single-sample and differential analyses, choosing a high-performing algorithm variant did improve results considerably. For near-cis scenarios, r3Cseq, peakC and FourCSeq offered high precision, while fourSig demonstrated high overall F1 scores in far-cis analyses. Finally, 4C-seq simulations may aid in the development of improved analysis algorithms. AVAILABILITY AND IMPLEMENTATION:Basic4CSim is available at https://github.com/walter-ca/Basic4CSim. SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.
Project description:Circularized Chromosome Conformation Capture followed by deep sequencing (4C-Seq) is a powerful technique to identify genome-wide partners interacting with a pre-specified genomic locus. Here, we present a computational and statistical approach to analyze 4C-Seq data generated from both enzyme digestion and sonication fragmentation-based methods. We implemented a command line software tool and a web interface called w4CSeq, which takes in the raw 4C sequencing data (FASTQ files) as input, performs automated statistical analysis and presents results in a user-friendly manner. Besides providing users with the list of candidate interacting sites/regions, w4CSeq generates figures showing genome-wide distribution of interacting regions, and sketches the enrichment of key features such as TSSs, TTSs, CpG sites and DNA replication timing around 4C sites.Users can establish their own web server by downloading source codes at https://github.com/WGLab/w4CSeq Additionally, a demo web server is available at http://w4cseq.wglab.org CONTACT: kaiwang@usc.edu or wangelu@usc.eduSupplementary information: Supplementary data are available at Bioinformatics online.
Project description:Next Generation Sequencing (NGS) is a powerful tool that depends on loading a precise amount of DNA onto a flowcell. NGS strategies have expanded our ability to investigate genomic phenomena by referencing mutations in cancer and diseases through large-scale genotyping, developing methods to map rare chromatin interactions (4C; 5C and Hi-C) and identifying chromatin features associated with regulatory elements (ChIP-seq, Bis-Seq, ChiA-PET). While many methods are available for DNA library quantification, there is no unambiguous gold standard. Most techniques use PCR to amplify DNA libraries to obtain sufficient quantities for optical density measurement. However, increased PCR cycles can distort the library's heterogeneity and prevent the detection of rare variants. In this analysis, we compared new digital PCR technologies (droplet digital PCR; ddPCR, ddPCR-Tail) with standard methods for the titration of NGS libraries. DdPCR-Tail is comparable to qPCR and fluorometry (QuBit) and allows sensitive quantification by analysis of barcode repartition after sequencing of multiplexed samples. This study provides a direct comparison between quantification methods throughout a complete sequencing experiment and provides the impetus to use ddPCR-based quantification for improvement of NGS quality.
Project description:FOXP3 is a lineage-specific transcription factor that is required for regulatory T cell development and function. In this study, we determined the crystal structure of the FOXP3 forkhead domain bound to DNA. The structure reveals that FOXP3 can form a stable domain-swapped dimer to bridge DNA in the absence of cofactors, suggesting that FOXP3 may play a role in long-range gene interactions. To test this hypothesis, we used circular chromosome conformation capture coupled with high throughput sequencing (4C-seq) to analyze FOXP3-dependent genomic contacts around a known FOXP3-bound locus, Ptpn22. Our studies reveal that FOXP3 induces significant changes in the chromatin contacts between the Ptpn22 locus and other Foxp3-regulated genes, reflecting a mechanism by which FOXP3 reorganizes the genome architecture to coordinate the expression of its target genes. Our results suggest that FOXP3 mediates long-range chromatin interactions as part of its mechanisms to regulate specific gene expression in regulatory T cells.