Analysis of the Functional Relevance of Epigenetic Chromatin Marks in the First Intron Associated with Specific Gene Expression Patterns.
ABSTRACT: We previously showed that the first intron of genes exhibits several interesting characteristics not seen in other introns: 1) it is the longest intron on average in almost all eukaryotes, 2) it presents the highest number of conserved sites, and 3) it exhibits the highest density of regulatory chromatin marks. Here, we expand on our previous study by integrating various multiomics data, leading to further evidence supporting the functionality of sites in the first intron. We first show that trait-associated single-nucleotide polymorphisms (TASs) are significantly enriched in the first intron. We also show that within the first intron, the density of epigenetic chromatin signals is higher near TASs than in distant regions. Furthermore, the distribution of several chromatin regulatory marks is investigated in relation to gene expression specificity (i.e., housekeeping vs. tissue-specific expression), essentiality (essential genes vs. nonessential genes), and levels of gene expression; housekeeping genes or essential genes contain greater proportions of active chromatin marks than tissue-specific genes or nonessential genes, and highly expressed genes exhibit a greater density of chromatin regulatory marks than genes with low expression. Moreover, we observe that genes carrying multiple first-intron TASs interact with each other within a large protein-protein interaction network, ultimately connecting to the UBC protein, a well-established protein involved in ubiquitination. We believe that our results shed light on the functionality of first introns as a genomic entity involved in gene expression regulation.
Project description:Genomes of higher eukaryotes have surprisingly long first introns and in some cases, the first introns have been shown to have higher conservation relative to other introns. However, the functional relevance of conserved regions in the first introns is poorly understood. Leveraging the recent ENCODE data, here we assess potential regulatory roles of conserved regions in the first intron of human genes.We first show that relative to other downstream introns, the first introns are enriched for blocks of highly conserved sequences. We also found that the first introns are enriched for several chromatin marks indicative of active regulatory regions and this enrichment of regulatory marks is correlated with enrichment of conserved blocks in the first intron; the enrichments of conservation and regulatory marks in first intron are not entirely explained by a general, albeit variable, bias for certain marks toward the 5' end of introns. Interestingly, conservation as well as proportions of active regulatory chromatin marks in the first intron of a gene correlates positively with the numbers of exons in the gene but the correlation is significantly weakened in second introns and negligible beyond the second intron. The first intron conservation is also positively correlated with the gene's expression level in several human tissues. Finally, a gene-wise analysis shows significant enrichments of active chromatin marks in conserved regions of first introns, relative to the conserved regions in other introns of the same gene.Taken together, our analyses strongly suggest that first introns are enriched for active transcriptional regulatory signals under purifying selection.
Project description:Essential genes have been studied by copy number variants and deletions, both associated with introns. The premise of our work is that introns of essential genes have distinct characteristic properties. We provide support for this by training a deep learning model and demonstrating that introns alone can be used to classify essentiality. The model, limited to first introns, performs at an increased level, implicating first introns in essentiality. We identify unique properties of introns of essential genes, finding that their structure protects against deletion and intron-loss events, especially centered on the first intron. We show that GC density is increased in the first introns of essential genes, allowing for increased enhancer activity, protection against deletions, and improved splice site recognition. We find that first introns of essential genes are of remarkably smaller size than their nonessential counterparts, and to protect against common 3' end deletion events, essential genes carry an increased number of (smaller) introns. To demonstrate the importance of the seven features we identified, we train a feature-based model using only these features and achieve high performance.
Project description:Most of the transcribed genes in eukaryotic cells are interrupted by intervening sequences called introns that are co-transcriptionally removed from nascent messenger RNA through the process of splicing. In Arabidopsis, 79% of genes contain introns and more than 60% of intron-containing genes undergo alternative splicing (AS), which ostensibly is considered to increase protein diversity as one of the intrinsic mechanisms for fitness to the varying environment or the internal developmental program. In addition, recent findings have prevailed in terms of overlooked intron functions. Here, we review recent progress in the underlying mechanisms of intron function, in particular by focusing on unique features of the first intron that is located in close proximity to the transcription start site. The distinct deposition of epigenetic marks and nucleosome density on the first intronic DNA sequence, the impact of the first intron on determining the transcription start site and elongation of its own expression (called intron-mediated enhancement, IME), translation control in 5'-UTR, and the new mechanism of the trans-acting function of the first intron in regulating gene expression at the post-transcriptional level are summarized.
Project description:The Drosophila melanogaster polytene chromosomes are the best model for studying the genome organization during interphase. Despite of the long-term studies available on genetic organization of polytene chromosome bands and interbands, little is known regarding long gene location on chromosomes. To analyze it, we used bioinformatic approaches and characterized genome-wide distribution of introns in gene bodies and in different chromatin states, and using fluorescent in situ hybridization we juxtaposed them with the chromosome structures. Short introns up to 2 kb in length are located in the bodies of housekeeping genes (grey bands or lazurite chromatin). In the group of 70 longest genes in the Drosophila genome, 95% of total gene length accrues to introns. The mapping of the 15 long genes showed that they could occupy extended sections of polytene chromosomes containing band and interband series, with promoters located in the interband fragments (aquamarine chromatin). Introns (malachite and ruby chromatin) in polytene chromosomes form independent bands, which can contain either both introns and exons or intron material only. Thus, a novel type of the gene arrangement in polytene chromosomes was discovered; peculiarities of such genetic organization are discussed.
Project description:In previous studies, we demonstrated that some sites in the first intron likely regulate gene expression. In the present work, we sought to further confirm the functional relevance of first intron sites by estimating the quantity of rare alleles in the first intron. A basic hypothesis posited herein is that genomic regions carrying more functionally important sites will have a higher proportion of rare alleles. We estimated the proportions of rare single nucleotide polymorphisms with a minor allele frequency < 0.01 located in several histone marks in the first introns of various genes, and compared them with those in other introns and those in 2-kb upstream regions. As expected, rare alleles were found to be significantly enriched in most of the regulatory sites located in the first introns. Meanwhile, transcription factor binding sites were significantly more enriched in the 2-kb upstream regions (i.e., the regions of putative promoters of genes) than in the first introns. These results strongly support our proposal that the first intron sites of genes may have important regulatory functions in gene expression independent of promoters.
Project description:<h4>Background</h4>Introns and their splicing are tightly coupled with the subsequent mRNA maturation steps, especially nucleocytoplasmic export. A remarkable fraction of vertebrate introns have a minimal size of about 100 bp, while majority of introns expand to several kilobases even megabases in length.<h4>Principal findings</h4>We carried out analyses on the evolution and function of minimal introns (50-150 bp) in human and mouse genomes. We found that minimal introns are conserved in terms of both length and sequence. They are preferentially located toward 3' end of mRNA and non-randomly distributed among chromosomes. Both the evolutionary conservation and non-random distribution are indicative of biological relevance. We showed that genes with minimal introns have higher abundance, larger size, and tend to be universally expressed as compared to genes with only large introns and intron-less genes. Genes with minimal introns replicate earlier and preferentially reside in the vicinities of open chromatin, suggesting their unique nuclear position and potential relevance to the regulation of gene expression and transcript export.<h4>Conclusions</h4>Based on these observations, we proposed a nuclear-export routing model, where minimal introns play a regulatory role in selectively exporting the highly abundant and large housekeeping genes that reside at the surface of chromatin territories, and thus preventing entanglement with other genes located at the interior locations.
Project description:The cDNAs and genes encoding the intron lariat-debranching enzyme were isolated from the nematode Caenorhabditis elegans and the fission yeast Schizosaccharomyces pombe based on their homology with the Saccharomyces cerevisiae gene. The cDNAs were shown to be functional in an interspecific complementation experiment; they can complement an S. cerevisiae dbr1 null mutant. About 2.5% of budding yeast S. cerevisiae genes have introns, and the accumulation of excised introns in a dbr1 null mutant has little effect on cell growth. In contrast, many S. pombe genes contain introns, and often multiple introns per gene, so that S. pombe is estimated to contain approximately 40 times as many introns as S. cerevisiae. The S. pombe dbr1 gene was disrupted and shown to be nonessential. Like the S. cerevisiae mutant, the S. pombe null mutant accumulated introns to high levels, indicating that intron lariat debranching represents a rate-limiting step in intron degradation in both species. Unlike the S. cerevisiae mutant, the S. pombe dbr1::leu1+ mutant had a severe growth defect and exhibited an aberrant elongated cell shape in addition to an intron accumulation phenotype. The growth defect of the S. pombe dbr1::leu1+ strain suggests that debranching activity is critical for efficient intron RNA degradation and that blocking this pathway interferes with cell growth.
Project description:<h4>Background</h4>The packaging of DNA into chromatin regulates transcription from initiation through 3' end processing. One aspect of transcription in which chromatin plays a poorly understood role is the co-transcriptional splicing of pre-mRNA.<h4>Results</h4>Here we provide evidence that H2B monoubiquitylation (H2BK123ub1) marks introns in Saccharomyces cerevisiae. A genome-wide map of H2BK123ub1 in this organism reveals that this modification is enriched in coding regions and that its levels peak at the transcribed regions of two characteristic subgroups of genes. First, long genes are more likely to have higher levels of H2BK123ub1, correlating with the postulated role of this modification in preventing cryptic transcription initiation in ORFs. Second, genes that are highly transcribed also have high levels of H2BK123ub1, including the ribosomal protein genes, which comprise the majority of intron-containing genes in yeast. H2BK123ub1 is also a feature of introns in the yeast genome, and the disruption of this modification alters the intragenic distribution of H3 trimethylation on lysine 36 (H3K36me3), which functionally correlates with alternative RNA splicing in humans. In addition, the deletion of genes encoding the U2 snRNP subunits, Lea1 or Msl1, in combination with an htb-K123R mutation, leads to synthetic lethality.<h4>Conclusion</h4>These data suggest that H2BK123ub1 facilitates cross talk between chromatin and pre-mRNA splicing by modulating the distribution of intronic and exonic histone modifications.
Project description:Introns are a ubiquitous feature of eukaryotic genomes, and the dynamics of intron evolution between species has been extensively studied. However, comparatively few analyses have focused on the evolutionary forces shaping patterns of intron variation within species. To better understand the population genetic characteristics of introns, we performed an extensive population genetics analysis on key intron splice sequences obtained from 38 strains of Saccharomyces cerevisiae. As expected, we found that purifying selection is the dominant force governing intron splice sequence evolution in yeast, formally confirming that intron-containing alleles are a mutational liability. In addition, through extensive coalescent simulations, we obtain quantitative estimates of the strength of purifying selection (2N(e)s approximately 19) and use diffusion approximations to provide insights into the evolutionary dynamics and sojourn times of newly arising splice sequence mutations in natural yeast populations. In contrast to previous functional studies, evolutionary analyses comparing the prevalence of introns in essential and nonessential genes suggest that introns in nonribosomal protein genes are functionally important and tend to be actively maintained in natural populations of S. cerevisiae. Finally, we demonstrate that heritable variation in splicing efficiency is common in intron-containing genes with splice sequence polymorphisms. More generally, our study highlights the advantages of population genomics analyses for exploring the forces that have generated extant patterns of genome variation and for illuminating basic biological processes.
Project description:<h4>Background</h4>It is established that protein-coding exons are preferentially localized in nucleosomes. To examine whether the same is true for non-coding exons, we analysed nucleosome occupancy in and adjacent to internal exons in genes encoding long non-coding RNAs (lncRNAs) in human CD4+ T cells and K562 cells.<h4>Results</h4>We confirmed that internal exons in lncRNAs are preferentially associated with nucleosomes, but also observed an elevated signal from H3K4me3-marked nucleosomes in the sequences upstream of these exons. Examination of 200 genomic lncRNA loci chosen at random across all chromosomes showed that high-density regions of H3K4me3-marked nucleosomes, which we term 'slabs', are associated with genomic regions exhibiting intron retention. These retained introns occur in over 50% of lncRNAs examined and are mostly first introns with an average length of just 354 bp, compared to the average length of all human introns of 6355 and 7987 bp in mRNAs and lncRNAs, respectively. Removal of short introns from the dataset abrogated the high upstream H3K4me3 signal, confirming that the association of slabs and short lncRNA introns with intron retention holds genome-wide. The high upstream H3K4me3 signal is also associated with alternatively spliced exons, known to be prominent in lncRNAs. This phenomenon was not observed with mRNAs.<h4>Conclusions</h4>There is widespread intron retention and clustered H3K4me3-marked nucleosomes in short first introns of human long non-coding RNAs, which raises intriguing questions about the relationship of IR to lncRNA function and chromatin organization.