Functional and structural basis of extreme conservation in vertebrate 5’ untranslated regions [icm2]
Ontology highlight
ABSTRACT: The lack of knowledge about extreme conservation in genomes remains a major gap in our understanding of the evolution of gene regulation. Here, we reveal an unexpected role of extremely conserved 5’UTRs in non-canonical translational regulation that is linked to the emergence of essential developmental features in vertebrate species. Endogenous deletion of conserved elements within these 5’UTRs decreased gene expression, and extremely conserved 5’UTRs possess cis-regulatory elements that promote cell-type specific regulation of translation. We further developed in-cell mutate-and-map (icM2), a novel methodology that maps RNA structure inside cells. Using icM2, we determined that an extremely conserved 5’UTR encodes multiple alternative structures and that each single nucleotide within the conserved element maintains the balance of alternative structures important to control the dynamic range of protein expression. These results explain how extreme sequence conservation can lead to RNA-level biological functions encoded in the untranslated regions of vertebrate genomes.
Project description:The lack of knowledge about extreme conservation in genomes remains a major gap in our understanding of the evolution of gene regulation. Here, we reveal an unexpected role of extremely conserved 5’UTRs in non-canonical translational regulation that is linked to the emergence of essential developmental features in vertebrate species. Endogenous deletion of conserved elements within these 5’UTRs decreased gene expression, and extremely conserved 5’UTRs possess cis-regulatory elements that promote cell-type specific regulation of translation. We further developed in-cell mutate-and-map (icM2), a novel methodology that maps RNA structure inside cells. Using icM2, we determined that an extremely conserved 5’UTR encodes multiple alternative structures and that each single nucleotide within the conserved element maintains the balance of alternative structures important to control the dynamic range of protein expression. These results explain how extreme sequence conservation can lead to RNA-level biological functions encoded in the untranslated regions of vertebrate genomes.
Project description:We report the genome-wide screen of the vertebrate genomes for conserved RNA structures (CRSs) by CMfinder [Yao et al. Bioinformatics 2006] and predict around 515k human genomic regions that harbor such structures. For a customized CaptureSeq experiment 125k probes of 60bp length were synthesized for both strands. These probes target 77k CRS regions, which were selected by largest CRS prediction score (pscores), more than 3 substitutions to their closest genomic paralog, and conservation in human and mouse.
Project description:Through alternative processing of pre-mRNAs, individual mammalian genes often produce multiple mRNA and protein isoforms that may have related, distinct or even opposing functions. Here we report an in-depth analysis of 15 diverse human tissue and cell line transcriptomes based on deep sequencing of cDNA fragments, yielding a digital inventory of gene and mRNA isoform expression. Analysis of mappings of sequence reads to exon-exon junctions indicated that ~94% of human genes undergo alternative splicing (AS), ~86% with a minor isoform frequency of 15% or more. Differences in isoform-specific read densities indicated that a majority of AS and alternative cleavage and polyadenylation (APA) events exhibit variation between tissues. Variations in alternative mRNA isoform expression between 6 individuals were also detected in cerebellar cortex, with ~2- to 3-fold less isoform variation observed between individuals than between tissues. Extreme or 'switch-like' regulation of splicing between tissues was associated with increased sequence conservation and with generation of full-length open reading frames. Patterns of AS and APA were strongly correlated across tissues, suggesting coordinated regulation, and sequence conservation of known regulatory motifs in both regulated introns and 3' UTRs suggested common involvement of the same factors in regulation of tissue-specific splicing and polyadenylation. Exam mRNA expression in 15 human tissues and cell lines
Project description:Through alternative processing of pre-mRNAs, individual mammalian genes often produce multiple mRNA and protein isoforms that may have related, distinct or even opposing functions. Here we report an in-depth analysis of 15 diverse human tissue and cell line transcriptomes based on deep sequencing of cDNA fragments, yielding a digital inventory of gene and mRNA isoform expression. Analysis of mappings of sequence reads to exon-exon junctions indicated that ~94% of human genes undergo alternative splicing (AS), ~86% with a minor isoform frequency of 15% or more. Differences in isoform-specific read densities indicated that a majority of AS and alternative cleavage and polyadenylation (APA) events exhibit variation between tissues. Variations in alternative mRNA isoform expression between 6 individuals were also detected in cerebellar cortex, with ~2- to 3-fold less isoform variation observed between individuals than between tissues. Extreme or 'switch-like' regulation of splicing between tissues was associated with increased sequence conservation and with generation of full-length open reading frames. Patterns of AS and APA were strongly correlated across tissues, suggesting coordinated regulation, and sequence conservation of known regulatory motifs in both regulated introns and 3' UTRs suggested common involvement of the same factors in regulation of tissue-specific splicing and polyadenylation.
Project description:Computer algorithms are often used to identify tRNA genes in newly sequenced genomes, but these predictions can be challenging. Not only are there structural variations and extremely limited sequence conservation among genes, but vertebrate genomes tend to have highly reiterated short interspersed sequences (SINEs) that originally derived from tRNA genes or tRNA-like transcription units. We have employed two programs, tRNAScan SE and ARAGORN, to predict the tRNA genes in the mouse nuclear genome, resulting in quite diverse but overlapping predicted gene sets. From these, we removed known SINE repeats and sorted the genes into predicted families and single-copy genes. In particular, four families of intron-containing tRNA genes were predicted, with introns in positions and structures analogous to the well characterized intron-containing tRNA genes in yeast. In this work we focus on verifying the expression of the intron-containing tRNA gene familes, as well as the other 30 tRNA gene familes. Keywords: tRNA, direct label
Project description:The genomes of RNA viruses encode the information required for replication in host cells both in their linear sequence and in complex higher-order structures. A subset of these RNA genome structures show clear sequence conservation, and have been extensively described for well-characterized viruses. However, the extent to which viral RNA genomes contain functional structural elements – unable to be detected by sequence alone – that nonetheless are critical to viral fitness is largely unknown. Here, we devise a structure-first experimental strategy and use it to identify 22 structure-similar motifs across the coding sequences of the RNA genomes for the four dengue virus (DENV) serotypes. At least ten of these motifs modulate viral fitness, revealing a significant unnoticed extent of RNA structure-mediated regulation within viral coding sequences. These viral RNA structures promote a compact global genome architecture, interact with proteins, and regulate the viral replication cycle. These motifs are also thus constrained at the levels of both RNA structure and protein sequence and are potential resistance-refractory targets for antivirals and live-attenuated vaccines. Structure-first identification of conserved RNA structure enables efficient discovery of pervasive RNA-mediated regulation in viral genomes and, likely, other cellular RNAs.
Project description:Accurate control of tissue-specific gene expression plays a pivotal role in heart development. However, few cardiac transcriptional enhancers have thus far been identified. Extreme non-coding sequence conservation successfully predicts enhancers active in many tissues, but fails to identify substantial numbers of enhancers active in the heart. We used ChIP-seq with the enhancer-associated protein p300 from mouse embryonic heart tissue to identify over three thousand candidate heart enhancers genome-wide. In contrast to other studied tissues at this time-point, most candidate heart enhancers are not deeply conserved in vertebrate evolution. Nevertheless, the testing of 130 candidate regions in a transgenic mouse assay revealed that most of them reproducibly function as enhancers active in the heart, irrespective of their degree of evolutionary constraint. These results provide evidence for tissue-dependent differences in evolutionary constraint of enhancers acting through the transcriptional co-activator p300 at this time-point, and identify a large population of poorly conserved heart enhancers. Examination of p300 binding in embryonic stage 11.5 mouse heart and midbrain