Computational Processing and Quality Control of Hi-C, Capture Hi-C and Capture-C Data.
ABSTRACT: Hi-C, capture Hi-C (CHC) and Capture-C have contributed greatly to our present understanding of the three-dimensional organization of genomes in the context of transcriptional regulation by characterizing the roles of topological associated domains, enhancer promoter loops and other three-dimensional genomic interactions. The analysis is based on counts of chimeric read pairs that map to interacting regions of the genome. However, the processing and quality control presents a number of unique challenges. We review here the experimental and computational foundations and explain how the characteristics of restriction digests, sonication fragments and read pairs can be exploited to distinguish technical artefacts from valid read pairs originating from true chromatin interactions.
Project description:BACKGROUND:Target enrichment combined with chromosome conformation capturing methodologies such as capture Hi-C (CHC) can be used to investigate spatial layouts of genomic regions with high resolution and at scalable costs. A common application of CHC is the investigation of regulatory elements that are in contact with promoters, but CHC can be used for a range of other applications. Therefore, probe design for CHC needs to be adapted to experimental needs, but no flexible tool is currently available for this purpose. RESULTS:We present a Java desktop application called GOPHER (Generator Of Probes for capture Hi-C Experiments at high Resolution) that implements three strategies for CHC probe design. GOPHER's simple approach is similar to the probe design of previous approaches that employ CHC to investigate all promoters, with one probe being placed at each margin of a single digest that overlaps the transcription start site (TSS) of each promoter. GOPHER's simple-patched approach extends this methodology with a heuristic that improves coverage of viewpoints in which the TSS is located near to one of the boundaries of the digest. GOPHER's extended approach is intended mainly for focused investigations of smaller gene sets. GOPHER can also be used to design probes for regions other than TSS such as GWAS hits or large blocks of genomic sequence. GOPHER additionally provides a number of features that allow users to visualize and edit viewpoints, and outputs a range of files useful for documentation, ordering probes, and downstream analysis. CONCLUSION:GOPHER is an easy-to-use and robust desktop application for CHC probe design. Source code and a precompiled executable can be downloaded from the GOPHER GitHub page at https://github.com/TheJacksonLaboratory/Gopher .
Project description:Capture Hi-C (CHi-C) is a state-of-the art method for profiling chromosomal interactions involving targeted regions of interest (such as gene promoters) globally and at high resolution. Signal detection in CHi-C data involves a number of statistical challenges that are not observed when using other Hi-C-like techniques. We present a background model, and algorithms for normalisation and multiple testing that are specifically adapted to CHi-C experiments, in which many spatially dispersed regions are captured, such as in Promoter CHi-C. We implement these procedures in CHiCAGO (http://regulatorygenomicsgroup.org/chicago), an open-source package for robust interaction detection in CHi-C. We validate CHiCAGO by showing that promoter-interacting regions detected with this method are enriched for regulatory features and disease-associated SNPs. Three human CHi-C biological replicates were generated (comprising 1, 2and 3 technical replicates). Two mouse CHi-C biological replicates were generated (both comprising three technical replicates) and a mouse Hi-C dataset. The publicly available HiCUP pipeline (doi: 10.12688/f1000research.7334.1) was used to process the raw sequencing reads. This pipeline was used to map the read pairs against the mouse (mm9) and human (hg19) genomes, to filter experimental artefacts (such as circularized reads and re-ligations), and to remove duplicate reads. For the CHi-C data, the resulting BAM files were processed into CHiCAGO input files, retaining only those read pairs that mapped, at least on one end, to a captured bait. CHiCAGO then identified Hi-C restriction fragments interacting, with statistical significant, to captured baits.
Project description:HiCUP is a pipeline for processing sequence data generated by Hi-C and Capture Hi-C (CHi-C) experiments, which are techniques used to investigate three-dimensional genomic organisation. The pipeline maps data to a specified reference genome and removes artefacts that would otherwise hinder subsequent analysis. HiCUP also produces an easy-to-interpret yet detailed quality control (QC) report that assists in refining experimental protocols for future studies. The software is freely available and has already been used for processing Hi-C and CHi-C data in several recently published peer-reviewed studies.
Project description:It is becoming increasingly important to understand the mechanism of regulatory elements on target genes in long-range genomic distance. 3C (chromosome conformation capture) and its derived methods are now widely applied to investigate three-dimensional (3D) genome organizations and gene regulation. Digestion-ligation-only Hi-C (DLO Hi-C) is a new technology with high efficiency and cost-effectiveness for whole-genome chromosome conformation capture. Here, we introduce the DLO Hi-C tool, a flexible and versatile pipeline for processing DLO Hi-C data from raw sequencing reads to normalized contact maps and for providing quality controls for different steps. It includes more efficient iterative mapping and linker filtering. We applied the DLO Hi-C tool to different DLO Hi-C datasets and demonstrated its ability in processing large data with multithreading. The DLO Hi-C tool is suitable for processing DLO Hi-C and in situ DLO Hi-C datasets. It is convenient and efficient for DLO Hi-C data processing.
Project description:Chromosome conformation capture-based methods such as Hi-C have become mainstream techniques for the study of the 3D organization of genomes. These methods convert chromatin interactions reflecting topological chromatin structures into digital information (counts of pair-wise interactions). Here, we describe an updated protocol for Hi-C (Hi-C 2.0) that integrates recent improvements into a single protocol for efficient and high-resolution capture of chromatin interactions. This protocol combines chromatin digestion and frequently cutting enzymes to obtain kilobase (kb) resolution. It also includes steps to reduce random ligation and the generation of uninformative molecules, such as unligated ends, to improve the amount of valid intra-chromosomal read pairs. This protocol allows for obtaining information on conformational structures such as compartment and topologically associating domains, as well as high-resolution conformational features such as DNA loops.
Project description:HiCUP is a pipeline for processing sequence data generated by Hi-C, a technique used to investigate the three-dimensional organisation of a genome. The pipeline maps data to a specified reference genome and removes artefacts that would otherwise hinder subsequent analysis. HiCUP also provides an easy-to-interpret yet detailed quality control report that may be used by researchers to refine their experimental protocol for future studies. The software is freely available and has already been used for processing Hi-C data in several recently published peer-reviewed research articles. This experiment investigates the impact of using HiCUP to remove putative PCR amplification products in heavily duplicated Capture Hi-C libraries. Examination of three Capture Hi-C libraries
Project description:Hi-C experiments produce large numbers of DNA sequence read pairs that are typically analyzed to deduce genomewide interactions between arbitrary loci. A key step in these experiments is the cleavage of cross-linked chromatin with a restriction endonuclease. Although this cleavage should happen specifically at the enzyme's recognition sequence, an unknown proportion of cleavage events may involve other sequences, owing to the enzyme's star activity or to random DNA breakage. A quantitative estimation of these non-specific cleavages may enable simulating realistic Hi-C read pairs for validation of downstream analyses, monitoring the reproducibility of experimental conditions and investigating biophysical properties that correlate with DNA cleavage patterns. Here we describe a computational method for analyzing Hi-C read pairs to estimate the fractions of cleavages at different possible targets. The method relies on expressing an observed local target distribution downstream of aligned reads as a linear combination of known conditional local target distributions. We validated this method using Hi-C read pairs obtained by computer simulation. Application of the method to experimental Hi-C datasets from murine cells revealed interesting similarities and differences in patterns of cleavage across the various experiments considered.
Project description:Chromatin conformation capture with high-throughput sequencing (Hi-C) is a technique that measures the in vivo intensity of interactions between all pairs of loci in the genome. Most conventional analyses of Hi-C data focus on the detection of statistically significant interactions. However, an alternative strategy involves identifying significant changes in the interaction intensity (i.e., differential interactions) between two or more biological conditions. This is more statistically rigorous and may provide more biologically relevant results.Here, we present the diffHic software package for the detection of differential interactions from Hi-C data. diffHic provides methods for read pair alignment and processing, counting into bin pairs, filtering out low-abundance events and normalization of trended or CNV-driven biases. It uses the statistical framework of the edgeR package to model biological variability and to test for significant differences between conditions. Several options for the visualization of results are also included. The use of diffHic is demonstrated with real Hi-C data sets. Performance against existing methods is also evaluated with simulated data.On real data, diffHic is able to successfully detect interactions with significant differences in intensity between biological conditions. It also compares favourably to existing software tools on simulated data sets. These results suggest that diffHic is a viable approach for differential analyses of Hi-C data.
Project description:Chromatin organisation of trophoblast stem cells (TSC) were compared with that of embryonic stem cells (ESC). The method enriches Hi-C libraries, to detect promoter interactions at restriction fragment level. We prepared Hi-C libraries from TSC and ESC (serum grown) samples and enriched them with a promoter capture bait system that captures ~22.000 promoters. Promoter interactions were then analysed using the GOTHiC pipeline.
Project description:<h4>Summary</h4>Capture Hi-C is a powerful approach for detecting chromosomal interactions involving, at least on one end, DNA regions of interest, such as gene promoters. We present Chicdiff, an R package for robust detection of differential interactions in Capture Hi-C data. Chicdiff enhances a state-of-the-art differential testing approach for count data with bespoke normalization and multiple testing procedures that account for specific statistical properties of Capture Hi-C. We validate Chicdiff on published Promoter Capture Hi-C data in human Monocytes and CD4+ T cells, identifying multitudes of cell type-specific interactions, and confirming the overall positive association between promoter interactions and gene expression.<h4>Availability and implementation</h4>Chicdiff is implemented as an R package that is publicly available at https://github.com/RegulatoryGenomicsGroup/chicdiff.<h4>Supplementary information</h4>Supplementary data are available at Bioinformatics online.