Mutational Biases Drive Elevated Rates of Substitution at Regulatory Sites across Cancer Types.
ABSTRACT: Disruption of gene regulation is known to play major roles in carcinogenesis and tumour progression. Here, we comprehensively characterize the mutational profiles of diverse transcription factor binding sites (TFBSs) across 1,574 completely sequenced cancer genomes encompassing 11 tumour types. We assess the relative rates and impact of the mutational burden at the binding sites of 81 transcription factors (TFs), by comparing the abundance and patterns of single base substitutions within putatively functional binding sites to control sites with matched sequence composition. There is a strong (1.43-fold) and significant excess of mutations at functional binding sites across TFs, and the mutations that accumulate in cancers are typically more disruptive than variants tolerated in extant human populations at the same sites. CTCF binding sites suffer an exceptionally high mutational load in cancer (3.31-fold excess) relative to control sites, and we demonstrate for the first time that this effect is seen in essentially all cancer types with sufficient data. The sub-set of CTCF sites involved in higher order chromatin structures has the highest mutational burden, suggesting a widespread breakdown of chromatin organization. However, we find no evidence for selection driving these distinctive patterns of mutation. The mutational load at CTCF-binding sites is substantially determined by replication timing and the mutational signature of the tumor in question, suggesting that selectively neutral processes underlie the unusual mutation patterns. Pervasive hyper-mutation within transcription factor binding sites rewires the regulatory landscape of the cancer genome, but it is dominated by mutational processes rather than selection.
Project description:BACKGROUND:Chromatin loops form a basic unit of interphase nuclear organization, with chromatin loop anchor points providing contacts between regulatory regions and promoters. However, the mutational landscape at these anchor points remains under-studied. Here, we describe the unusual patterns of somatic mutations and germline variation associated with loop anchor points and explore the underlying features influencing these patterns. RESULTS:Analyses of whole genome sequencing datasets reveal that anchor points are strongly depleted for single nucleotide variants (SNVs) in tumours. Despite low SNV rates in their genomic neighbourhood, anchor points emerge as sites of evolutionary innovation, showing enrichment for structural variant (SV) breakpoints and a peak of SNVs at focal CTCF sites within the anchor points. Both CTCF-bound and non-CTCF anchor points harbour an excess of SV breakpoints in multiple tumour types and are prone to double-strand breaks in cell lines. Common fragile sites, which are hotspots for genome instability, also show elevated numbers of intersecting loop anchor points. Recurrently disrupted anchor points are enriched for genes with functions in cell cycle transitions and regions associated with predisposition to cancer. We also discover a novel class of CTCF-bound anchor points which overlap meiotic recombination hotspots and are enriched for the core PRDM9 binding motif, suggesting that the anchor points have been foci for diversity generated during recent human evolution. CONCLUSIONS:We suggest that the unusual chromatin environment at loop anchor points underlies the elevated rates of variation observed, marking them as sites of regulatory importance but also genomic fragility.
Project description:Homeotic genes code for key transcription factors (HOX-TFs) that pattern the animal body plan. During embryonic development, Hox genes are expressed in overlapping patterns and function in a partially redundant manner. In vitro biochemical screens probing the HOX-TF sequence specificity revealed largely overlapping sequence preferences, indicating that co-factors might modulate the biological function of HOX-TFs. However, due to their overlapping expression pattern, high protein homology, and insufficiently specific antibodies, little is known about their genome-wide binding preferences. In order to overcome this problem, we virally expressed tagged versions of limb-expressed posterior HOX genes (HOXA9-13, and HOXD9-13) in primary chicken mesenchymal limb progenitor cells (micromass). We determined the effect of each HOX-TF on cellular differentiation (chondrogenesis) and gene expression and found that groups of HOX-TFs induce distinct regulatory programs. We used ChIP-seq to determine their individual genome-wide binding profiles and identified between 12,721 and 28,572 binding sites for each of the nine HOX-TFs. Principal Component Analysis (PCA) of binding profiles revealed that the HOX-TFs are clustered in two subgroups (Group 1: HOXA/D9, HOXA/D10, HOXD12, and HOXA13 and Group 2: HOXA/D11 and HOXD13), which are characterized by differences in their sequence specificity and by the presence of cofactor motifs. Specifically, we identified CTCF binding sites in Group 1, indicating that this subgroup of HOX-proteins cooperates with CTCF. We confirmed this interaction by an independent biological assay (Proximity Ligation Assay) and demonstrated that CTCF is a novel HOX cofactor that specifically associates with Group 1 HOX-TFs, pointing towards a possible interplay between HOX-TFs and chromatin architecture.
Project description:Recent evidence shows that the disruption of constitutive insulated neighbourhoods might lead to oncogene dysregulation. We present here a systematic pan-cancer characterisation of the associations between constitutive boundaries and genome alterations in cancer. Specifically, we investigate the enrichment of somatic mutation, abnormal methylation, and copy number alteration events in the proximity of CTCF bindings overlapping with topological boundaries (junctions) in 26 cancer types. Focusing on CTCF motifs that are both in-boundary (overlapping with junctions) and active (overlapping with peaks of CTCF expression), we find a significant enrichment of somatic mutations in several cancer types. Furthermore, mutated junctions are significantly conserved across cancer types, and we also observe a positive selection of transversions rather than transitions in many cancer types. We also analyzed the mutational signature found on the different classes of CTCF motifs, finding some signatures (such as SBS26) to have a higher weight within in-boundary than off-bounday motifs. Regarding methylation, we find a significant number of over-methylated active in-boundary CTCF motifs in several cancer types; similarly to somatic-mutated junctions, they also have a significant conservation across cancer types. Finally, in several cancer types we observe that copy number alterations tend to overlap with active junctions more often than in matched normal samples. While several articles have recently reported a mutational enrichment at CTCF binding sites for specific cancer types, our analysis is pan-cancer and investigates abnormal methylation and copy number alterations in addition to somatic mutations. Our method is fully replicable and suggests several follow-up tumour-specific analyses.
Project description:At least 25 inherited disorders in humans result from microsatellite repeat expansion. Dramatic variation in repeat instability occurs at different disease loci and between different tissues; however, cis-elements and trans-factors regulating the instability process remain undefined. Genomic fragments from the human spinocerebellar ataxia type 7 (SCA7) locus, containing a highly unstable CAG tract, were previously introduced into mice to localize cis-acting "instability elements," and revealed that genomic context is required for repeat instability. The critical instability-inducing region contained binding sites for CTCF -- a regulatory factor implicated in genomic imprinting, chromatin remodeling, and DNA conformation change. To evaluate the role of CTCF in repeat instability, we derived transgenic mice carrying SCA7 genomic fragments with CTCF binding-site mutations. We found that CTCF binding-site mutation promotes triplet repeat instability both in the germ line and in somatic tissues, and that CpG methylation of CTCF binding sites can further destabilize triplet repeat expansions. As CTCF binding sites are associated with a number of highly unstable repeat loci, our findings suggest a novel basis for demarcation and regulation of mutational hot spots and implicate CTCF in the modulation of genetic repeat instability.
Project description:Cell-type diversity is governed in part by differential gene expression programs mediated by transcription factor (TF) binding. However, there are few systematic studies of the genomic binding of different types of TFs across a wide range of human cell types, especially in relation to gene expression. In the ENCODE Project, we have identified the genomic binding locations across 11 different human cell types of CTCF, RNA Pol II (RNAPII), and MYC, three TFs with diverse roles. Our data and analysis revealed how these factors bind in relation to genomic features and shape gene expression and cell-type specificity. CTCF bound predominantly in intergenic regions while RNAPII and MYC preferentially bound to core promoter regions. CTCF sites were relatively invariant across diverse cell types, while MYC showed the greatest cell-type specificity. MYC and RNAPII co-localized at many of their binding sites and putative target genes. Cell-type specific binding sites, in particular for MYC and RNAPII, were associated with cell-type specific functions. Patterns of binding in relation to gene features were generally conserved across different cell types. RNAPII occupancy was higher over exons than adjacent introns, likely reflecting a link between transcriptional elongation and splicing. TF binding was positively correlated with the expression levels of their putative target genes, but combinatorial binding, in particular of MYC and RNAPII, was even more strongly associated with higher gene expression. These data illuminate how combinatorial binding of transcription factors in diverse cell types is associated with gene expression and cell-type specific biology.
Project description:BACKGROUND:The chromatin insulator CCCTC-binding factor (CTCF) displays tissue-specific DNA binding sites that regulate transcription and chromatin organization. Despite evidence linking CTCF to the protection of epigenetic states through barrier insulation, the impact of CTCF loss on genome-wide DNA methylation sites in human cancer remains undefined. RESULTS:Here, we demonstrate that prostate and breast cancers within The Cancer Genome Atlas (TCGA) exhibit frequent copy number loss of CTCF and that this loss is associated with increased DNA methylation events that occur preferentially at CTCF binding sites. CTCF sites differ among tumor types and result in tissue-specific methylation patterns with little overlap between breast and prostate cancers. DNA methylation and transcriptome profiling in vitro establish that forced downregulation of CTCF leads to spatially distinct DNA hypermethylation surrounding CTCF binding sites, loss of CTCF binding, and decreased gene expression that is also seen in human tumors. DNA methylation inhibition reverses loss of expression at these CTCF-regulated genes. CONCLUSION:These findings establish CTCF loss as a major mediator in directing localized DNA hypermethylation events in a tissue-specific fashion and further support its role as a driver of the cancer phenotype.
Project description:Somatic mutations of many cancer genes tend to co-occur (termed co-mutations) in certain patterns during tumor initiation and progression. However, the genetic and epigenetic mechanisms that contribute to the co-mutations of these cancer genes have yet to be explored. Here, we systematically investigated the association between the somatic co-mutations of cancer genes and high-order chromatin conformation. Significantly, somatic point co-mutations in protein-coding genes were closely associated with high-order spatial chromatin folding. We propose that these regions be termed Spatial Co-mutation Hotspots (SCHs) and report their occurrence in different cancer types. The conserved mutational signatures and DNA sequences flanking these point co-mutations, as well as CTCF-binding sites, are also enriched within the SCH regions. The genetic alterations that are harboured in the same SCHs tend to disrupt cancer driver genes involved in multiple signalling pathways. The present work demonstrates that high-order spatial chromatin organisation may contribute to the somatic co-mutations of certain cancer genes during tumor development.
Project description:Tissue-specific driver mutations in non-coding genomic regions remain undefined for most cancer types. Here, we unbiasedly analyze 212 gastric cancer (GC) whole genomes to identify recurrently mutated non-coding regions in GC. Applying comprehensive statistical approaches to accurately model background mutational processes, we observe significant enrichment of non-coding indels (insertions/deletions) in three gastric lineage-specific genes. We further identify 34 mutation hotspots, of which 11 overlap CTCF binding sites (CBSs). These CBS hotspots remain significant even after controlling for a genome-wide elevated mutation rate at CBSs. In 3 out of 4 tested CBS hotspots, mutations are nominally associated with expression change of neighboring genes. CBS hotspot mutations are enriched in tumors showing chromosomal instability, co-occur with neighboring chromosomal aberrations, and are common in gastric (25%) and colorectal (19%) tumors but rare in other cancer types. Mutational disruption of specific CBSs may thus represent a tissue-specific mechanism of tumorigenesis conserved across gastrointestinal cancers.
Project description:Transposable elements (TEs) represent a substantial fraction of many eukaryotic genomes, and transcriptional regulation of these factors is important to determine TE activities in human cells. However, due to the repetitive nature of TEs, identifying transcription factor (TF)-binding sites from ChIP-sequencing (ChIP-seq) datasets is challenging. Current algorithms are focused on subtle differences between TE copies and thus bias the analysis to relatively old and inactive TEs. Here we describe an approach termed "MapRRCon" (mapping repeat reads to a consensus) which allows us to identify proteins binding to TE DNA sequences by mapping ChIP-seq reads to the TE consensus sequence after whole-genome alignment. Although this method does not assign binding sites to individual insertions in the genome, it provides a landscape of interacting TFs by capturing factors that bind to TEs under various conditions. We applied this method to screen TFs' interaction with L1 in human cells/tissues using ENCODE ChIP-seq datasets and identified 178 of the 512 TFs tested as bound to L1 in at least one biological condition with most of them (138) localized to the promoter. Among these L1-binding factors, we focused on Myc and CTCF, as they play important roles in cancer progression and 3D chromatin structure formation. Furthermore, we explored the transcriptomes of The Cancer Genome Atlas breast and ovarian tumor samples in which a consistent anti-/correlation between L1 and Myc/CTCF expression was observed, suggesting that these two factors may play roles in regulating L1 transcription during the development of such tumors.
Project description:BACKGROUND:The three-dimensional genome organization is critical for gene regulation and can malfunction in diseases like cancer. As a key regulator of genome organization, CCCTC-binding factor (CTCF) has been characterized as a DNA-binding protein with important functions in maintaining the topological structure of chromatin and inducing DNA looping. Among the prolific binding sites in the genome, several events with altered CTCF occupancy have been reported as associated with effects in physiology or disease. However, hitherto there is no comprehensive survey of genome-wide CTCF binding patterns across different human cancers. RESULTS:To dissect functions of CTCF binding, we systematically analyze over 700 CTCF ChIP-seq profiles across human tissues and cancers and identify cancer-specific CTCF binding patterns in six cancer types. We show that cancer-specific lost and gained CTCF binding events are associated with altered chromatin interactions, partially with DNA methylation changes, and rarely with sequence mutations. While lost bindings primarily occur near gene promoters, most gained CTCF binding events exhibit enhancer activities and are induced by oncogenic transcription factors. We validate these findings in T cell acute lymphoblastic leukemia cell lines and patient samples and show that oncogenic NOTCH1 induces specific CTCF binding and they cooperatively activate expression of target genes, indicating transcriptional condensation phenomena. CONCLUSIONS:Specific CTCF binding events occur in human cancers. Cancer-specific CTCF binding can be induced by other transcription factors to regulate oncogenic gene expression. Our results substantiate CTCF binding alteration as a functional epigenomic signature of cancer.