ASHIC: hierarchical Bayesian modeling of diploid chromatin contacts and structures.
ABSTRACT: The recently developed Hi-C technique has been widely applied to map genome-wide chromatin interactions. However, current methods for analyzing diploid Hi-C data cannot fully distinguish between homologous chromosomes. Consequently, the existing diploid Hi-C analyses are based on sparse and inaccurate allele-specific contact matrices, which might lead to incorrect modeling of diploid genome architecture. Here we present ASHIC, a hierarchical Bayesian framework to model allele-specific chromatin organizations in diploid genomes. We developed two models under the Bayesian framework: the Poisson-multinomial (ASHIC-PM) model and the zero-inflated Poisson-multinomial (ASHIC-ZIPM) model. The proposed ASHIC methods impute allele-specific contact maps from diploid Hi-C data and simultaneously infer allelic 3D structures. Through simulation studies, we demonstrated that ASHIC methods outperformed existing approaches, especially under low coverage and low SNP density conditions. Additionally, in the analyses of diploid Hi-C datasets in mouse and human, our ASHIC-ZIPM method produced fine-resolution diploid chromatin maps and 3D structures and provided insights into the allelic chromatin organizations and functions. To summarize, our work provides a statistically rigorous framework for investigating fine-scale allele-specific chromatin conformations. The ASHIC software is publicly available at https://github.com/wmalab/ASHIC.
Project description:<h4>Background</h4>In diploid cells, it is important to construct maternal and paternal Hi-C contact maps respectively since the two homologous chromosomes can differ in chromatin three-dimensional (3D) organization. Though previous softwares could construct diploid (maternal and paternal) Hi-C contact maps by using phased genetic variants, they all neglected the systematic biases in diploid Hi-C contact maps caused by variable genetic variant density in the genome. In addition, few of softwares provided quantitative analyses on allele-specific chromatin 3D organization, including compartment, topological domain and chromatin loop.<h4>Results</h4>In this work, we revealed the feature of allele-assignment bias caused by the variable genetic variant density, and then proposed a novel strategy to correct the systematic biases in diploid Hi-C contact maps. Based on the bias correction, we developed an integrated tool, called HiCHap, to perform read mapping, contact map construction, whole-genome identification of compartments, topological domains and chromatin loops, and allele-specific testing for diploid Hi-C data. Our results show that the correction on allele-assignment bias in HiCHap does significantly improve the quality of diploid Hi-C contact maps, which subsequently facilitates the whole-genome identification of diploid chromatin 3D organization, including compartments, topological domains and chromatin loops. Finally, HiCHap also supports the data analysis for haploid Hi-C maps without distinguishing two homologous chromosomes.<h4>Conclusions</h4>We provided an integrated package HiCHap to perform the data processing, bias correction and structural analysis for diploid Hi-C data. The source code and tutorial of software HiCHap are freely available at https://pypi.org/project/HiCHap/ .
Project description:Sequencing reads overlapping polymorphic sites in diploid mammalian genomes may be assigned to one allele or the other. This holds the potential to detect gene expression, chromatin modifications, DNA methylation or nuclear interactions in an allele-specific fashion. SNPsplit is an allele-specific alignment sorter designed to read files in SAM/BAM format and determine the allelic origin of reads or read-pairs that cover known single nucleotide polymorphic (SNP) positions. For this to work libraries must have been aligned to a genome in which all known SNP positions were masked with the ambiguity base 'N' and aligned using a suitable mapping program such as Bowtie2, TopHat, STAR, HISAT2, HiCUP or Bismark. SNPsplit also provides an automated solution to generate N-masked reference genomes for hybrid mouse strains based on the variant call information provided by the Mouse Genomes Project. The unique ability of SNPsplit to work with various different kinds of sequencing data including RNA-Seq, ChIP-Seq, Bisulfite-Seq or Hi-C opens new avenues for the integrative exploration of allele-specific data.
Project description:Knowledge of spatial chromosomal organizations is critical for the study of transcriptional regulation and other nuclear processes in the cell. Recently, chromosome conformation capture (3C) based technologies, such as Hi-C and TCC, have been developed to provide a genome-wide, three-dimensional (3D) view of chromatin organization. Appropriate methods for analyzing these data and fully characterizing the 3D chromosomal structure and its structural variations are still under development. Here we describe a novel Bayesian probabilistic approach, denoted as "Bayesian 3D constructor for Hi-C data" (BACH), to infer the consensus 3D chromosomal structure. In addition, we describe a variant algorithm BACH-MIX to study the structural variations of chromatin in a cell population. Applying BACH and BACH-MIX to a high resolution Hi-C dataset generated from mouse embryonic stem cells, we found that most local genomic regions exhibit homogeneous 3D chromosomal structures. We further constructed a model for the spatial arrangement of chromatin, which reveals structural properties associated with euchromatic and heterochromatic regions in the genome. We observed strong associations between structural properties and several genomic and epigenetic features of the chromosome. Using BACH-MIX, we further found that the structural variations of chromatin are correlated with these genomic and epigenetic features. Our results demonstrate that BACH and BACH-MIX have the potential to provide new insights into the chromosomal architecture of mammalian cells.
Project description:<h4>Background</h4>Co-localized combinations of histone modifications ("chromatin states") have been shown to correlate with promoter and enhancer activity. Changes in chromatin states over multiple time points ("chromatin state trajectories") have previously been analyzed at promoter and enhancers separately. With the advent of time series Hi-C data it is now possible to connect promoters and enhancers and to analyze chromatin state trajectories at promoter-enhancer pairs.<h4>Results</h4>We present TimelessFlex, a framework for investigating chromatin state trajectories at promoters and enhancers and at promoter-enhancer pairs based on Hi-C information. TimelessFlex extends our previous approach Timeless, a Bayesian network for clustering multiple histone modification data sets at promoter and enhancer feature regions. We utilize time series ATAC-seq data measuring open chromatin to define promoters and enhancer candidates. We developed an expectation-maximization algorithm to assign promoters and enhancers to each other based on Hi-C interactions and jointly cluster their feature regions into paired chromatin state trajectories. We find jointly clustered promoter-enhancer pairs showing the same activation patterns on both sides but with a stronger trend at the enhancer side. While the promoter side remains accessible across the time series, the enhancer side becomes dynamically more open towards the gene activation time point. Promoter cluster patterns show strong correlations with gene expression signals, whereas Hi-C signals get only slightly stronger towards activation. The code of the framework is available at https://github.com/henriettemiko/TimelessFlex .<h4>Conclusions</h4>TimelessFlex clusters time series histone modifications at promoter-enhancer pairs based on Hi-C and it can identify distinct chromatin states at promoter and enhancer feature regions and their changes over time.
Project description:How chromatin folds in three-dimensional (3D) space is closely related to transcription regulation. As powerful tools to study such 3D chromatin conformation, the recently developed Hi-C technologies enable a genome-wide measurement of pair-wise chromatin interaction. However, methods for the detection of biologically meaningful chromatin interactions, i.e. peak calling, from Hi-C data, are still under development. In our previous work, we have developed a novel hidden Markov random field (HMRF) based Bayesian method, which through explicitly modeling the non-negligible spatial dependency among adjacent pairs of loci manifesting in high resolution Hi-C data, achieves substantially improved robustness and enhanced statistical power in peak calling. Superior to peak callers that ignore spatial dependency both methodologically and in performance, our previous Bayesian framework suffers from heavy computational costs due to intensive computation incurred by modeling the correlated peak status of neighboring loci pairs and the inference of hidden dependency structure.In this work, we have developed FastHiC, a novel approach based on simulated field approximation, which approximates the joint distribution of the hidden peak status by a set of independent random variables, leading to more tractable computation. Performance comparisons in real data analysis showed that FastHiC not only speeds up our original Bayesian method by more than five times, bus also achieves higher peak calling accuracy.FastHiC is freely accessible at:http://www.unc.edu/?yunmli/FastHiC/ CONTACTS: : firstname.lastname@example.org or email@example.comSupplementary data are available at Bioinformatics online.
Project description:Allele-specific measurements of transcription factor binding from ChIP-seq data are key to dissecting the allelic effects of non-coding variants and their contribution to phenotypic diversity. However, most methods of detecting an allelic imbalance assume diploid genomes. This assumption severely limits their applicability to cancer samples with frequent DNA copy-number changes. Here we present a Bayesian statistical approach called BaalChIP to correct for the effect of background allele frequency on the observed ChIP-seq read counts. BaalChIP allows the joint analysis of multiple ChIP-seq samples across a single variant and outperforms competing approaches in simulations. Using 548 ENCODE ChIP-seq and six targeted FAIRE-seq samples, we show that BaalChIP effectively corrects allele-specific analysis for copy-number variation and increases the power to detect putative cis-acting regulatory variants in cancer genomes.
Project description:Chromosome conformation capture (3C) techniques have revealed many fascinating insights into the spatial organization of genomes. 3C methods typically provide information about chromosomal contacts in a large population of cells, which makes it difficult to draw conclusions about the three-dimensional organization of genomes in individual cells. Recently it became possible to study single cells with Hi-C, a genome-wide 3C variant, demonstrating a high cell-to-cell variability of genome organization. In principle, restraint-based modeling should allow us to infer the 3D structure of chromosomes from single-cell contact data, but suffers from the sparsity and low resolution of chromosomal contacts. To address these challenges, we adapt the Bayesian Inferential Structure Determination (ISD) framework, originally developed for NMR structure determination of proteins, to infer statistical ensembles of chromosome structures from single-cell data. Using ISD, we are able to compute structural error bars and estimate model parameters, thereby eliminating potential bias imposed by ad hoc parameter choices. We apply and compare different models for representing the chromatin fiber and for incorporating singe-cell contact information. Finally, we extend our approach to the analysis of diploid chromosome data.
Project description:<h4>Background</h4>Hi-C and its variant techniques have been developed to capture the spatial organization of chromatin. Normalization of Hi-C contact map is essential for accurate modeling and interpretation of high-throughput chromatin conformation capture (3C) experiments. Hi-C correction tools were originally developed to normalize systematic biases of karyotypically normal cell lines. However, a vast majority of available Hi-C datasets are derived from cancer cell lines that carry multi-level DNA copy number variations (CNVs). CNV regions display over- or under-representation of interaction frequencies compared to CN-neutral regions. Therefore, it is necessary to remove CNV-driven bias from chromatin interaction data of cancer cell lines to generate a euploid-equivalent contact map.<h4>Results</h4>We developed the HiCNAtra framework to compute high-resolution CNV profiles from Hi-C or 3C-seq data of cancer cell lines and to correct chromatin contact maps from systematic biases including CNV-associated bias. First, we introduce a novel 'entire-fragment' counting method for better estimation of the read depth (RD) signal from Hi-C reads that recapitulates the whole-genome sequencing (WGS)-derived coverage signal. Second, HiCNAtra employs a multimodal-based hierarchical CNV calling approach, which outperformed OneD and HiNT tools, to accurately identify CNVs of cancer cell lines. Third, incorporating CNV information with other systematic biases, HiCNAtra simultaneously estimates the contribution of each bias and explicitly corrects the interaction matrix using Poisson regression. HiCNAtra normalization abolishes CNV-induced artifacts from the contact map generating a heatmap with homogeneous signal. When benchmarked against OneD, CAIC, and ICE methods using MCF7 cancer cell line, HiCNAtra-corrected heatmap achieves the least 1D signal variation without deforming the inherent chromatin interaction signal. Additionally, HiCNAtra-corrected contact frequencies have minimum correlations with each of the systematic bias sources compared to OneD's explicit method. Visual inspection of CNV profiles and contact maps of cancer cell lines reveals that HiCNAtra is the most robust Hi-C correction tool for ameliorating CNV-induced bias.<h4>Conclusions</h4>HiCNAtra is a Hi-C-based computational tool that provides an analytical and visualization framework for DNA copy number profiling and chromatin contact map correction of karyotypically abnormal cell lines. HiCNAtra is an open-source software implemented in MATLAB and is available at https://github.com/AISKhalil/HiCNAtra .
Project description:DNA replication occurs in a defined temporal order known as the replication-timing (RT) program. RT is regulated during development in discrete chromosomal units, coordinated with transcriptional activity and 3D genome organization. Here, we derived distinct cell types from F1 hybrid musculus × castaneus mouse crosses and exploited the high single-nucleotide polymorphism (SNP) density to characterize allelic differences in RT (Repli-seq), genome organization (Hi-C and promoter-capture Hi-C), gene expression (total nuclear RNA-seq), and chromatin accessibility (ATAC-seq). We also present HARP, a new computational tool for sorting SNPs in phased genomes to efficiently measure allele-specific genome-wide data. Analysis of six different hybrid mESC clones with different genomes (C57BL/6, 129/sv, and CAST/Ei), parental configurations, and gender revealed significant RT asynchrony between alleles across ?12% of the autosomal genome linked to subspecies genomes but not to parental origin, growth conditions, or gender. RT asynchrony in mESCs strongly correlated with changes in Hi-C compartments between alleles but not as strongly with SNP density, gene expression, imprinting, or chromatin accessibility. We then tracked mESC RT asynchronous regions during development by analyzing differentiated cell types, including extraembryonic endoderm stem (XEN) cells, four male and female primary mouse embryonic fibroblasts (MEFs), and neural precursor cells (NPCs) differentiated in vitro from mESCs with opposite parental configurations. We found that RT asynchrony and allelic discordance in Hi-C compartments seen in mESCs were largely lost in all differentiated cell types, accompanied by novel sites of allelic asynchrony at a considerably smaller proportion of the genome, suggesting that genome organization of homologs converges to similar folding patterns during cell fate commitment.
Project description:Hi-C and chromatin immunoprecipitation (ChIP) have been combined to identify long-range chromatin interactions genome-wide at reduced cost and enhanced resolution, but extracting information from the resulting datasets has been challenging. Here we describe a computational method, MAPS, Model-based Analysis of PLAC-seq and HiChIP, to process the data from such experiments and identify long-range chromatin interactions. MAPS adopts a zero-truncated Poisson regression framework to explicitly remove systematic biases in the PLAC-seq and HiChIP datasets, and then uses the normalized chromatin contact frequencies to identify significant chromatin interactions anchored at genomic regions bound by the protein of interest. MAPS shows superior performance over existing software tools in the analysis of chromatin interactions from multiple PLAC-seq and HiChIP datasets centered on different transcriptional factors and histone marks. MAPS is freely available at https://github.com/ijuric/MAPS.