OnTAD: hierarchical domain structure reveals the divergence of activity among TADs and boundaries.
ABSTRACT: The spatial organization of chromatin in the nucleus has been implicated in regulating gene expression. Maps of high-frequency interactions between different segments of chromatin have revealed topologically associating domains (TADs), within which most of the regulatory interactions are thought to occur. TADs are not homogeneous structural units but appear to be organized into a hierarchy. We present OnTAD, an optimized nested TAD caller from Hi-C data, to identify hierarchical TADs. OnTAD reveals new biological insights into the role of different TAD levels, boundary usage in gene regulation, the loop extrusion model, and compartmental domains. OnTAD is available at https://github.com/anlin00007/OnTAD.
Project description:BACKGROUND:Topologically associating domains (TADs) are genomic regions of self-interaction. Additionally, it is known that TAD boundaries are enriched in CTCF binding sites. In turn, CTCF sites are known to be asymmetric, whereby the convergent configuration of a pair of CTCF sites leads to the formation of a chromatin loop in vivo. However, to date, it has been unclear how to reconcile TAD structure with CTCF-based chromatin loops. RESULTS:We approach this problem by analysing CTCF binding site strengths and classifying clusters of CTCF sites along the genome on the basis of their relative orientation. Analysis of CTCF site orientation classes as a function of their spatial distribution along the human genome reveals that convergent CTCF site clusters are depleted while divergent CTCF clusters are enriched in the 5- to 100-kb range. We then analyse the distribution of CTCF binding sites as a function of TAD boundary conservation across seven primary human blood cell types. This reveals divergent CTCF site enrichment at TAD boundaries. Furthermore, convergent arrays of CTCF sites separate the left and right sections of TADs that harbour internal CTCF sites, resulting in unequal TAD 'halves'. CONCLUSIONS:The orientation-based CTCF binding site cluster classification that we present reconciles TAD boundaries and CTCF site clusters in a mechanistically elegant fashion. This model suggests that the emergent structure of nuclear chromatin in the form of TADs relies on the obligate alternation of divergent and convergent CTCF site clusters that occur at different length scales along the genome.
Project description:The genome folds into a hierarchy of three-dimensional structures within the nucleus. At the sub-megabase scale, chromosomes form topologically associating domains (TADs)<sup>1-4</sup>. However, how TADs fold in single cells is elusive. Here, we reveal TAD features inaccessible to cell population analysis by using super-resolution microscopy. TAD structures and physical insulation associated with their borders are variable between individual cells, yet chromatin intermingling is enriched within TADs compared to adjacent TADs in most cells. The spatial segregation of TADs is further exacerbated during cell differentiation. Favored interactions within TADs are regulated by cohesin and CTCF through distinct mechanisms: cohesin generates chromatin contacts and intermingling while CTCF prevents inter-TAD contacts. Furthermore, TADs are subdivided into discrete nanodomains, which persist in cells depleted of CTCF or cohesin, whereas disruption of nucleosome contacts alters their structural organization. Altogether, these results provide a physical basis for the folding of individual chromosomes at the nanoscale.
Project description:Deciphering the rules of genome folding in the cell nucleus is essential to understand its functions. Recent chromosome conformation capture (Hi-C) studies have revealed that the genome is partitioned into topologically associating domains (TADs), which demarcate functional epigenetic domains defined by combinations of specific chromatin marks. However, whether TADs are true physical units in each cell nucleus or whether they reflect statistical frequencies of measured interactions within cell populations is unclear. Using a combination of Hi-C, three-dimensional (3D) fluorescent in situ hybridization, super-resolution microscopy, and polymer modeling, we provide an integrative view of chromatin folding in Drosophila. We observed that repressed TADs form a succession of discrete nanocompartments, interspersed by less condensed active regions. Single-cell analysis revealed a consistent TAD-based physical compartmentalization of the chromatin fiber, with some degree of heterogeneity in intra-TAD conformations and in cis and trans inter-TAD contact events. These results indicate that TADs are fundamental 3D genome units that engage in dynamic higher-order inter-TAD connections. This domain-based architecture is likely to play a major role in regulatory transactions during DNA-dependent processes.
Project description:Genomes are organized into self-interacting chromatin regions called topologically associated domains (TADs). A significant number of TAD boundaries are shared across multiple cell types and conserved across species. Disruption of TAD boundaries may affect the expression of nearby genes and could lead to several diseases. Even though detection of TAD boundaries is important and useful, there are experimental challenges in obtaining high resolution TAD locations. Here, we present computational prediction of TAD boundaries from high resolution Hi-C data in fruit flies. By extensive exploration and testing of several deep learning model architectures with hyperparameter optimization, we show that a unique deep learning model consisting of three convolution layers followed by a long short-term-memory layer achieves an accuracy of 96%. This outperforms feature-based models' accuracy of 91% and an existing method's accuracy of 73-78% based on motif TRAP scores. Our method also detects previously reported motifs such as Beaf-32 that are enriched in TAD boundaries in fruit flies and also several unreported motifs.
Project description:MOTIVATION:The three-dimensional structure of the genome is an important regulator of many cellular processes including differentiation and gene regulation. Recently, technologies such as Hi-C that combine proximity ligation with high-throughput sequencing have revealed domains of self-interacting chromatin, called topologically associating domains (TADs), in many organisms. Current methods for identifying TADs using Hi-C data assume that TADs are non-overlapping, despite evidence for a nested structure in which TADs and sub-TADs form a complex hierarchy. RESULTS:We introduce a model for decomposition of contact frequencies into a hierarchy of nested TADs. This model is based on empirical distributions of contact frequencies within TADs, where positions that are far apart have a greater enrichment of contacts than positions that are close together. We find that the increase in contact enrichment with distance is stronger for the inner TAD than for the outer TAD in a TAD/sub-TAD pair. Using this model, we develop the TADtree algorithm for detecting hierarchies of nested TADs. TADtree compares favorably with previous methods, finding TADs with a greater enrichment of chromatin marks such as CTCF at their boundaries. AVAILABILITY AND IMPLEMENTATION:A python implementation of TADtree is available at http://compbio.cs.brown.edu/software/ CONTACT:firstname.lastname@example.org SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.
Project description:<h4>Background</h4>Topologically associating domains (TADs) are thought to act as functional units in the genome. TADs co-localise genes and their regulatory elements as well as forming the unit of genome switching between active and inactive compartments. This has led to the speculation that genes which are required for similar processes may fall within the same TADs, allowing them to share regulatory programs and efficiently switch between chromatin compartments. However, evidence to link genes within TADs to the same regulatory program is limited.<h4>Results</h4>We investigated the functional similarity of genes which fall within the same TAD. To do this we developed a TAD randomisation algorithm to generate sets of "random TADs" to act as null distributions. We found that while pairs of paralogous genes are enriched in TADs overall, they are largely depleted in TADs with CCCTC-binding factor (CTCF) ChIP-seq peaks at both boundaries. By assessing gene constraint as a proxy for functional importance we found that genes which singly occupy a TAD have greater functional importance than genes which share a TAD, and these genes are enriched for developmental processes. We found little evidence that pairs of genes in CTCF bound TADs are more likely to be co-expressed or share functional annotations than can be explained by their linear proximity alone.<h4>Conclusions</h4>These results suggest that algorithmically defined TADs consist of two functionally different groups, those which are bound by CTCF and those which are not. We detected no association between genes sharing the same CTCF TADs and increased co-expression or functional similarity, other than that explained by linear genome proximity. We do, however, find that functionally important genes are more likely to fall within a TAD on their own suggesting that TADs play an important role in the insulation of these genes.
Project description:The metazoan genome is compartmentalized in areas of highly interacting chromatin known as topologically associating domains (TADs). TADs are demarcated by boundaries mostly conserved across cell types and even across species. However, a genome-wide characterization of TAD boundary strength in mammals is still lacking. In this study, we first use fused two-dimensional lasso as a machine learning method to improve Hi-C contact matrix reproducibility, and, subsequently, we categorize TAD boundaries based on their insulation score. We demonstrate that higher TAD boundary insulation scores are associated with elevated CTCF levels and that they may differ across cell types. Intriguingly, we observe that super-enhancers are preferentially insulated by strong boundaries. Furthermore, we demonstrate that strong TAD boundaries and super-enhancer elements are frequently co-duplicated in cancer patients. Taken together, our findings suggest that super-enhancers insulated by strong TAD boundaries may be exploited, as a functional unit, by cancer cells to promote oncogenesis.
Project description:Chromosomes are organized into high-frequency chromatin interaction domains called topologically associating domains (TADs), which are separated from each other by domain boundaries. The molecular mechanisms responsible for TAD formation are not yet fully understood. In Drosophila, it has been proposed that transcription is fundamental for TAD organization while the participation of genetic sequences bound by architectural proteins (APs) remains controversial. Here, we investigate the contribution of domain boundaries to TAD organization and the regulation of gene expression at the Notch gene locus in Drosophila. We find that deletion of domain boundaries results in TAD fusion and long-range topological defects that are accompanied by loss of APs and RNA Pol II chromatin binding as well as defects in transcription. Together, our results provide compelling evidence of the contribution of discrete genetic sequences bound by APs and RNA Pol II in the partition of the genome into TADs and in the regulation of gene expression in Drosophila.
Project description:Mammalian genomes contain several dozens of large (>0.5 Mbp) lineage-specific gene loci harbouring functionally related genes. However, spatial chromatin folding, organization of the enhancer-promoter networks and their relevance to Topologically Associating Domains (TADs) in these loci remain poorly understood. TADs are principle units of the genome folding and represents the DNA regions within which DNA interacts more frequently and less frequently across the TAD boundary. Here, we used Chromatin Conformation Capture Carbon Copy (5C) technology to characterize spatial chromatin interaction network in the 3.1 Mb Epidermal Differentiation Complex (EDC) locus harbouring 61 functionally related genes that show lineage-specific activation during terminal keratinocyte differentiation in the epidermis. 5C data validated by 3D-FISH demonstrate that the EDC locus is organized into several TADs showing distinct lineage-specific chromatin interaction networks based on their transcription activity and the gene-rich or gene-poor status. Correlation of the 5C results with genome-wide studies for enhancer-specific histone modifications (H3K4me1 and H3K27ac) revealed that the majority of spatial chromatin interactions that involves the gene-rich TADs at the EDC locus in keratinocytes include both intra- and inter-TAD interaction networks, connecting gene promoters and enhancers. Compared to thymocytes in which the EDC locus is mostly transcriptionally inactive, these interactions were found to be keratinocyte-specific. In keratinocytes, the promoter-enhancer anchoring regions in the gene-rich transcriptionally active TADs are enriched for the binding of chromatin architectural proteins CTCF, Rad21 and chromatin remodeler Brg1. In contrast to gene-rich TADs, gene-poor TADs show preferential spatial contacts with each other, do not contain active enhancers and show decreased binding of CTCF, Rad21 and Brg1 in keratinocytes. Thus, spatial interactions between gene promoters and enhancers at the multi-TAD EDC locus in skin epithelial cells are cell type-specific and involve extensive contacts within TADs as well as between different gene-rich TADs, forming the framework for lineage-specific transcription.
Project description:Nuclear RNA and the act of transcription have been implicated in nuclear organization. However, their global contribution to shaping fundamental features of higher-order chromatin organization such as topologically associated domains (TADs) and genomic compartments remains unclear. To investigate these questions, we perform genome-wide chromatin conformation capture (Hi-C) analysis in the presence and absence of RNase before and after crosslinking, or a transcriptional inhibitor. TAD boundaries are largely unaffected by RNase treatment, although a subtle disruption of compartmental interactions is observed. In contrast, transcriptional inhibition leads to weaker TAD boundary scores. Collectively, our findings demonstrate differences in the relative contribution of RNA and transcription to the formation of TAD boundaries detected by the widely used Hi-C methodology.