The New Frontier of Functional Genomics: From Chromatin Architecture and Noncoding RNAs to Therapeutic Targets.
ABSTRACT: Common diseases are complex, multifactorial disorders whose pathogenesis is influenced by the interplay of genetic predisposition and environmental factors. Genome-wide association studies have interrogated genetic polymorphisms across genomes of individuals to test associations between genotype and susceptibility to specific disorders, providing insights into the genetic architecture of several complex disorders. However, genetic variants associated with the susceptibility to common diseases are often located in noncoding regions of the genome, such as tissue-specific enhancers or long noncoding RNAs, suggesting that regulatory elements might play a relevant role in human diseases. Enhancers are cis-regulatory genomic sequences that act in concert with promoters to regulate gene expression in a precise spatiotemporal manner. They can be located at a considerable distance from their cognate target promoters, increasing the difficulty of their identification. Genomes are organized in domains of chromatin folding, namely topologically associating domains (TADs). Identification of enhancer-promoter interactions within TADs has revealed principles of cell-type specificity across several organisms and tissues. The vast majority of mammalian genomes are pervasively transcribed, accounting for a previously unappreciated complexity of the noncoding RNA fraction. Particularly, long noncoding RNAs have emerged as key players for the establishment of chromatin architecture and regulation of gene expression. In this perspective, we describe the new advances in the fields of transcriptomics and genome organization, focusing on the role of noncoding genomic variants in the predisposition of common diseases. Finally, we propose a new framework for the identification of the next generation of pharmacological targets for common human diseases.
Project description:Genome-wide association studies (GWAS) have contributed significantly to predisposing the disease etiology by associating single nucleotide polymorphisms (SNPs) with complex diseases. However, most GWAS-SNPs are in the noncoding regions that may affect distal genes via long range enhancer-promoter interactions. Thus, the common practice on GWAS discoveries cannot fully reveal the molecular mechanisms underpinning complex diseases. It is known that perturbations of topological associated domains (TADs) lead to long range interactions which underlie disease etiology. To identify the probable long range interactions in noncoding regions via GWAS and TADs perturbed by deletions, we integrated datasets from GWAS-SNPs, enhancers, TADs, and deletions. After ranking and clustering, we prioritized 201,132 high confident pairs of GWAS-SNPs and target genes. In this study, we performed a systematic inference on noncoding regions via GWAS-SNPs and deletion-perturbed TADs to boost GWAS discovery power. The high confident pairs of GWAS-SNPs and target genes (SE-Gs) provide the promising candidates to understand the molecular mechanisms underlying complex diseases with emphasis on the three-dimensional genome.
Project description:Developmental genes in metazoan genomes are surrounded by dense clusters of conserved noncoding elements (CNEs). CNEs exhibit unexplained extreme levels of sequence conservation, with many acting as developmental long-range enhancers. Clusters of CNEs define the span of regulatory inputs for many important developmental regulators and have been described previously as genomic regulatory blocks (GRBs). Their function and distribution around important regulatory genes raises the question of how they relate to 3D conformation of these loci. Here, we show that clusters of CNEs strongly coincide with topological organisation, predicting the boundaries of hundreds of topologically associating domains (TADs) in human and Drosophila. The set of TADs that are associated with high levels of noncoding conservation exhibit distinct properties compared to TADs devoid of extreme noncoding conservation. The close correspondence between extreme noncoding conservation and TADs suggests that these TADs are ancient, revealing a regulatory architecture conserved over hundreds of millions of years.Metazoan genomes contain many clusters of conserved noncoding elements. Here, the authors provide evidence that these clusters coincide with distinct topologically associating domains in humans and Drosophila, revealing a conserved regulatory genomic architecture.
Project description:Conserved Noncoding Elements (CNEs) are elements exhibiting extreme noncoding conservation in Metazoan genomes. They cluster around developmental genes and act as long-range enhancers, yet nothing that we know about their function explains the observed conservation levels. Clusters of CNEs coincide with topologically associating domains (TADs), indicating ancient origins and stability of TAD locations. This has suggested further hypotheses about the still elusive origin of CNEs, and has provided a comparative genomics-based method of estimating the position of TADs around developmentally regulated genes in genomes where chromatin conformation capture data is missing. To enable researchers in gene regulation and chromatin biology to start deciphering this phenomenon, we developed CNEr, a R/Bioconductor toolkit for large-scale identification of CNEs and for studying their genomic properties. We apply CNEr to two novel genome comparisons-fruit fly vs tsetse fly, and two sea urchin genomes-and report novel insights gained from their analysis. We also show how to reveal interesting characteristics of CNEs by coupling CNEr with existing Bioconductor packages. CNEr is available at Bioconductor (https://bioconductor.org/packages/CNEr/) and maintained at github (https://github.com/ge11232002/CNEr).
Project description:BACKGROUND:The human genome is highly organized in the three-dimensional nucleus. Chromosomes fold locally into topologically associating domains (TADs) defined by increased intra-domain chromatin contacts. TADs contribute to gene regulation by restricting chromatin interactions of regulatory sequences, such as enhancers, with their target genes. Disruption of TADs can result in altered gene expression and is associated to genetic diseases and cancers. However, it is not clear to which extent TAD regions are conserved in evolution and whether disruption of TADs by evolutionary rearrangements can alter gene expression. RESULTS:Here, we hypothesize that TADs represent essential functional units of genomes, which are stable against rearrangements during evolution. We investigate this using whole-genome alignments to identify evolutionary rearrangement breakpoints of different vertebrate species. Rearrangement breakpoints are strongly enriched at TAD boundaries and depleted within TADs across species. Furthermore, using gene expression data across many tissues in mouse and human, we show that genes within TADs have more conserved expression patterns. Disruption of TADs by evolutionary rearrangements is associated with changes in gene expression profiles, consistent with a functional role of TADs in gene expression regulation. CONCLUSIONS:Together, these results indicate that TADs are conserved building blocks of genomes with regulatory functions that are often reshuffled as a whole instead of being disrupted by rearrangements.
Project description:Noncoding genetic variation is a major driver of phenotypic diversity, but functional interpretation is challenging. To better understand common genetic variation associated with brain diseases, we defined noncoding regulatory regions for major cell types of the human brain. Whereas psychiatric disorders were primarily associated with variants in transcriptional enhancers and promoters in neurons, sporadic Alzheimer's disease (AD) variants were largely confined to microglia enhancers. Interactome maps connecting disease-risk variants in cell-type-specific enhancers to promoters revealed an extended microglia gene network in AD. Deletion of a microglia-specific enhancer harboring AD-risk variants ablated BIN1 expression in microglia, but not in neurons or astrocytes. These findings revise and expand the list of genes likely to be influenced by noncoding variants in AD and suggest the probable cell types in which they function.
Project description:Mammalian genomes contain several dozens of large (>0.5 Mbp) lineage-specific gene loci harbouring functionally related genes. However, spatial chromatin folding, organization of the enhancer-promoter networks and their relevance to Topologically Associating Domains (TADs) in these loci remain poorly understood. TADs are principle units of the genome folding and represents the DNA regions within which DNA interacts more frequently and less frequently across the TAD boundary. Here, we used Chromatin Conformation Capture Carbon Copy (5C) technology to characterize spatial chromatin interaction network in the 3.1 Mb Epidermal Differentiation Complex (EDC) locus harbouring 61 functionally related genes that show lineage-specific activation during terminal keratinocyte differentiation in the epidermis. 5C data validated by 3D-FISH demonstrate that the EDC locus is organized into several TADs showing distinct lineage-specific chromatin interaction networks based on their transcription activity and the gene-rich or gene-poor status. Correlation of the 5C results with genome-wide studies for enhancer-specific histone modifications (H3K4me1 and H3K27ac) revealed that the majority of spatial chromatin interactions that involves the gene-rich TADs at the EDC locus in keratinocytes include both intra- and inter-TAD interaction networks, connecting gene promoters and enhancers. Compared to thymocytes in which the EDC locus is mostly transcriptionally inactive, these interactions were found to be keratinocyte-specific. In keratinocytes, the promoter-enhancer anchoring regions in the gene-rich transcriptionally active TADs are enriched for the binding of chromatin architectural proteins CTCF, Rad21 and chromatin remodeler Brg1. In contrast to gene-rich TADs, gene-poor TADs show preferential spatial contacts with each other, do not contain active enhancers and show decreased binding of CTCF, Rad21 and Brg1 in keratinocytes. Thus, spatial interactions between gene promoters and enhancers at the multi-TAD EDC locus in skin epithelial cells are cell type-specific and involve extensive contacts within TADs as well as between different gene-rich TADs, forming the framework for lineage-specific transcription.
Project description:The contraction pattern of the heart relies on the activation and conduction of the electrical impulse. Perturbations of cardiac conduction have been associated with congenital and acquired arrhythmias as well as cardiac arrest. The pattern of conduction depends on the regulation of heterogeneous gene expression by key transcription factors and transcriptional enhancers. Here, we assessed the genome-wide occupation of conduction system-regulating transcription factors TBX3, NKX2-5, and GATA4 and of enhancer-associated coactivator p300 in the mouse heart, uncovering cardiac enhancers throughout the genome. Many of the enhancers colocalized with ion channel genes repressed by TBX3, including the clustered sodium channel genes Scn5a, essential for cardiac function, and Scn10a. We identified 2 enhancers in the Scn5a/Scn10a locus, which were regulated by TBX3 and its family member and activator, TBX5, and are functionally conserved in humans. We also provided evidence that a SNP in the SCN10A enhancer associated with alterations in cardiac conduction patterns in humans disrupts TBX3/TBX5 binding and reduces the cardiac activity of the enhancer in vivo. Thus, the identification of key regulatory elements for cardiac conduction helps to explain how genetic variants in noncoding regulatory DNA sequences influence the regulation of cardiac conduction and the predisposition for cardiac arrhythmias.
Project description:Tandem repeats are common features of both prokaryote and eukaryote genomes, where they can be found not only in intergenic regions but also in both the noncoding and coding regions of a variety of different genes. The repeat expansion diseases are a group of human genetic disorders caused by long and highly polymorphic tandem repeats. These disorders provide many examples of the effects that such repeats can have on many biological processes. While repeats in the coding sequence can result in the generation of toxic or malfunctioning proteins, noncoding repeats can also have significant effects including the generation of chromosome fragility, the silencing of the genes in which they are located, the modulation of transcription and translation, and the sequestering of proteins involved in processes such as splicing and cell architecture.
Project description:CTCF (CCCTC-binding factor) is a transcription regulator with hundreds of binding sites in the human genome. It has a main function as an insulator protein, defining together with cohesins the boundaries of areas of the genome called topologically associating domains (TADs). TADs contain regulatory elements such as enhancers which function as regulators of the transcription of genes inside the boundaries of the TAD while they are restricted from regulating genes outside these boundaries. This paper will examine the most common genetic lesions of CTCF as well as its related protein CTCFL (CTCF-like also called BORIS) in cancer using publicly available data from published genomic studies. Cancer types where abnormalities in the two genes are more common will be examined for possible associations with underlying repair defects or other prevalent genetic lesions. The putative functional effects in CTCF and CTCFL lesions will also be explored.
Project description:Genome-wide association studies found that increased risk for atrial fibrillation (AF), the most common human heart arrhythmia, is associated with noncoding sequence variants located in proximity to PITX2 Cardiomyocyte-specific epigenomic and comparative genomics uncovered 2 AF-associated enhancers neighboring PITX2 with varying conservation in mice. Chromosome conformation capture experiments in mice revealed that the Pitx2c promoter directly contacted the AF-associated enhancer regions. CRISPR/Cas9-mediated deletion of a 20-kb topologically engaged enhancer led to reduced Pitx2c transcription and AF predisposition. Allele-specific chromatin immunoprecipitation sequencing on hybrid heterozygous enhancer knockout mice revealed that long-range interaction of an AF-associated region with the Pitx2c promoter was required for maintenance of the Pitx2c promoter chromatin state. Long-range looping was mediated by CCCTC-binding factor (CTCF), since genetic disruption of the intronic CTCF-binding site caused reduced Pitx2c expression, AF predisposition, and diminished active chromatin marks on Pitx2 AF risk variants located at 4q25 reside in genomic regions possessing long-range transcriptional regulatory functions directed at PITX2.