Predicting mean ribosome load for 5'UTR of any length using deep learning.
ABSTRACT: The 5' untranslated region plays a key role in regulating mRNA translation and consequently protein abundance. Therefore, accurate modeling of 5'UTR regulatory sequences shall provide insights into translational control mechanisms and help interpret genetic variants. Recently, a model was trained on a massively parallel reporter assay to predict mean ribosome load (MRL)-a proxy for translation rate-directly from 5'UTR sequence with a high degree of accuracy. However, this model is restricted to sequence lengths investigated in the reporter assay and therefore cannot be applied to the majority of human sequences without a substantial loss of information. Here, we introduced frame pooling, a novel neural network operation that enabled the development of an MRL prediction model for 5'UTRs of any length. Our model shows state-of-the-art performance on fixed length randomized sequences, while offering better generalization performance on longer sequences and on a variety of translation-related genome-wide datasets. Variant interpretation is demonstrated on a 5'UTR variant of the gene HBB associated with beta-thalassemia. Frame pooling could find applications in other bioinformatics predictive tasks. Moreover, our model, released open source, could help pinpoint pathogenic genetic variants.
Project description:Full-length RNA transcribed from the human LINE-1 (L1) element L1 Homo sapiens (L1Hs) has a 900-nt, G+C-rich, 5'-untranslated region (UTR). The 5' UTR is followed by two long open reading frames, ORF1 and ORF2, which are separated from each other by an inter-ORF region of 33 nt that includes two or three in-frame stop codons. We examine here the mechanism(s) by which the translation of L1Hs ORF1 and ORF2 is initiated. A stable hairpin structure (delta G = -74.8 kcal/mol), inserted at nt 661 of the 5' UTR, caused a 3- to 8-fold decrease in the in vitro and in vivo translation of either a lacZ reporter gene for ORF1 or the ORF1 polypeptide product, p40, but translation of a lacZ reporter gene in ORF2 was increased. The results are compatible with a model for ORF1 translation initiation in which the majority of ribosomes scan from a point 5' of nt 661 but suggest that ORF2 is not translated by attached ribosomes that reinitiate after the termination of ORF1 translation. Our data are compatible with a model whereby the translation of L1Hs ORF2 is initiated internally.
Project description:The genomic RNAs of flaviviruses such as dengue virus (DEN) have a 5' m7GpppN cap like those of cellular mRNAs but lack a 3' poly(A) tail. We have studied the contributions to translational expression of 5'- and 3'-terminal regions of the DEN serotype 2 genome by using luciferase reporter mRNAs transfected into Vero cells. DCLD RNA contained the entire DEN 5' and 3' untranslated regions (UTRs), as well as the first 36 codons of the capsid coding region fused to the luciferase reporter gene. Capped DCLD RNA was as efficiently translated in Vero cells as capped GLGpA RNA, a reporter with UTRs from the highly expressed alpha-globin mRNA and a 72-residue poly(A) tail. Analogous reporter RNAs with regulatory sequences from West Nile and Sindbis viruses were also strongly expressed. Although capped DCLD RNA was expressed much more efficiently than its uncapped form, uncapped DCLD RNA was translated 6 to 12 times more efficiently than uncapped RNAs with UTRs from globin mRNA. The 5' cap and DEN 3' UTR were the main sources of the translational efficiency of DCLD RNA, and they acted synergistically in enhancing translation. The DEN 3' UTR increased mRNA stability, although this effect was considerably weaker than the enhancement of translational efficiency. The DEN 3' UTR thus has translational regulatory properties similar to those of a poly(A) tail. Its translation-enhancing effect was observed for RNAs with globin or DEN 5' sequences, indicating no codependency between viral 5' and 3' sequences. Deletion studies showed that translational enhancement provided by the DEN 3' UTR is attributable to the cumulative contributions of several conserved elements, as well as a nonconserved domain adjacent to the stop codon. One of the conserved elements was the conserved sequence (CS) CS1 that is complementary to cCS1 present in the 5' end of the DEN polyprotein open reading frame. Complementarity between CS1 and cCS1 was not required for efficient translation.
Project description:Primary carnitine deficiency is caused by a defect in the active cellular uptake of carnitine by Na+ -dependent organic cation transporter novel 2 (OCTN2). Genetic diagnostic yield for this metabolic disorder has been relatively low, suggesting that disease-causing variants are missed. We Sanger sequenced the 5' untranslated region (UTR) of SLC22A5 in individuals with possible primary carnitine deficiency in whom no or only one mutant allele had been found. We identified a novel 5'-UTR c.-149G>A variant which we characterized by expression studies with reporter constructs in HeLa cells and by carnitine-transport measurements in fibroblasts using a newly developed sensitive assay based on tandem mass spectrometry. This variant, which we identified in 57 of 236 individuals of our cohort, introduces a functional upstream out-of-frame translation initiation codon. We show that the codon suppresses translation from the wild-type ATG of SLC22A5, resulting in reduced OCTN2 protein levels and concomitantly lower transport activity. With an allele frequency of 24.2% the c.-149G>A variant is the most frequent cause of primary carnitine deficiency in our cohort and may explain other reported cases with an incomplete genetic diagnosis. Individuals carrying this variant should be clinically re-evaluated and monitored to determine if this variant has clinical consequences.
Project description:Mammarenaviruses are enveloped viruses with a bisegmented negative-stranded RNA genome that encodes the nucleocapsid protein (NP), the envelope glycoprotein precursor (GPC), the RNA polymerase (L), and a RING matrix protein (Z). Viral proteins are synthesized from subgenomic mRNAs bearing a capped 5' untranslated region (UTR) and lacking 3' poly(A) tail. We analyzed the translation strategy of Tacaribe virus (TCRV), a prototype of the New World mammarenaviruses. A virus-like transcript that carries a reporter gene in place of the NP open reading frame and transcripts bearing modified 5' and/or 3' UTR were evaluated in a cell-based translation assay. We found that the presence of the cap structure at the 5' end dramatically increases translation efficiency and that the viral 5' UTR comprises stimulatory signals while the 3' UTR,specifically the presence of a terminal C+G-rich sequence and/or a stem-loop structure, down-modulates translation. Additionally, translation was profoundly reduced in eukaryotic initiation factor (eIF) 4G-inactivated cells, whereas depletion of intracellular levels of eIF4E had less impact on virus-like mRNA translation than on a cell-like transcript. Translation efficiency was independent of NP expression or TCRV infection. Our results indicate that TCRV mRNAs are translated using a cap-dependent mechanism, whose efficiency relies on the interplay between stimulatory signals in the 5' UTR and a negative modulatory element in the 3' UTR. The low dependence on eIF4E suggests that viral mRNAs may engage yet-unknown noncanonical host factors for a cap-dependent initiation mechanism.IMPORTANCE Several members of the Arenaviridae family cause serious hemorrhagic fevers in humans. In the present report, we describe the mechanism by which Tacaribe virus, a prototypic nonpathogenic New World mammarenavirus, regulates viral mRNA translation. Our results highlight the impact of untranslated sequences and key host translation factors on this process. We propose a model that explains how viral mRNAs outcompete cellular mRNAs for the translation machinery. A better understanding of the mechanism of translation regulation of this virus can provide the bases for the rational design of new antiviral tools directed to pathogenic arenaviruses.
Project description:Utrophin, the autosomal homologue of dystrophin can functionally compensate for dystrophin deficiency. Utrophin upregulation could therefore be a therapeutic strategy in Duchenne Muscular Dystrophy (DMD) that arises from mutation in dystrophin gene. In contrast to its transcriptional regulation, mechanisms operating at post-transcriptional level of utrophin expression have not been well documented. Although utrophin-A 5'-UTR has been reported with internal ribosome entry site (IRES), its inhibitory effect on translation is also evident. In the present study we therefore aimed to compare relative contribution of cap-independent and cap-dependent translation with mouse utrophin-A 5'-UTR through m7G-capped and A-capped mRNA transfection based reporter assay. Our results demonstrate that cap-independent translation with utrophin-A 5'-UTR is not as strong as viral IRES. However, cap-independent mode has significant contribution as cap-dependent translation is severely repressed with utrophin-A 5'-UTR. We further identified two sequence elements and one upstream open reading frame in utrophin-A 5'-UTR responsible for repression. The repressor elements in utrophin-A 5'-UTR may be targeted for utrophin upregulation.
Project description:Bcr-abl1 oncogene causes a shift in the transcription start site of the SMS1 gene (SGMS1) encoding the sphingomyelin (SM) synthesizing enzyme, sphingomyelin synthase 1 (SMS1). This results in an mRNA with a significantly shorter 5'-UTR, called 7-SGMS1, which is translated more efficiently than another transcript (IIb-SGMS1) with a longer 5'UTR in Bcr-abl1-positive cells. Here, we determine the effects of these alternative 5'UTRs on SMS1 translation and investigate the key features underlying such regulation. First, the presence of the longer IIb 5'UTR is sufficient to greatly impair translation of a reporter gene. Deletion of the upstream open reading frame (-164 nt) or of the predicted stem-loops in the 5'UTR of IIb-SGMS1 has minimal effects on SGMS1 translation. Conversely, deletion of nucleotides -310 to -132 enhanced transcription of IIb-SGMS1 to reach that of 7-SGMS1. We thus suggest that regulatory features within nucleotides -310 and -132 modulate IIb-SGMS1 translation efficiency.
Project description:The neurotrophin brain-derived neurotrophic factor (BDNF) is a key regulator of neuronal development and plasticity. BDNF is a major pharmaceutical target in neurodevelopmental and psychiatric disorders. However, pharmacological modulation of this neurotrophin is challenging because BDNF is generated by multiple, alternatively spliced transcripts with different 5'- and 3'UTRs. Each BDNF mRNA variant is transcribed independently, but translation regulation is unknown. To evaluate the translatability of BDNF transcripts, we developed an in vitro luciferase assay in human neuroblastoma cells. In unstimulated cells, each BDNF 5'- and 3'UTR determined a different basal translation level of the luciferase reporter gene. However, constructs with either a 5'UTR or a 3'UTR alone showed poor translation modulation by BDNF, KCl, dihydroxyphenylglycine, AMPA, NMDA, dopamine, acetylcholine, norepinephrine, or serotonin. Constructs consisting of the luciferase reporter gene flanked by the 5'UTR of one of the most abundant BDNF transcripts in the brain (exons 1, 2c, 4, and 6) and the long 3'UTR responded selectively to stimulation with the different receptor agonists, and only transcripts 2c and 6 were increased by the antidepressants desipramine and mirtazapine. We propose that BDNF mRNA variants represent "a quantitative code" for regulated expression of the protein. Thus, to discriminate the efficacy of drugs in stimulating BDNF synthesis, it is appropriate to use variant-specific in vitro screening tests.
Project description:Translation of mRNA sequences into proteins typically starts at an AUG triplet. In rare cases, translation may also start at alternative non-AUG codons located in the annotated 5' UTR which leads to an increased regulatory complexity. Since ribosome profiling detects translational start sites at the nucleotide level, the properties of these start sites can then be used for the statistical evaluation of functional open reading frames. We developed a linear regression approach to predict in-frame and out-of-frame translational start sites within the 5' UTR from mRNA sequence information together with their translation initiation confidence. Predicted start codons comprise AUG as well as near-cognate codons. The underlying datasets are based on published translational start sites for human HEK293 and mouse embryonic stem cells that were derived by the original authors from ribosome profiling data. The average prediction accuracy of true vs. false start sites for HEK293 cells was 80%. When applied to mouse mRNA sequences, the same model predicted translation initiation sites observed in mouse ES cells with an accuracy of 76%. Moreover, we illustrate the effect of in silico mutations in the flanking sequence context of a start site on the predicted initiation confidence. Our new webservice PreTIS visualizes alternative start sites and their respective ORFs and predicts their ability to initiate translation. Solely, the mRNA sequence is required as input. PreTIS is accessible at http://service.bioinformatik.uni-saarland.de/pretis.
Project description:Cellular stress such as endoplasmic reticulum stress, hypoxia, and viral infection activates an integrated stress response, which includes the phosphorylation of the eukaryotic initiation factor 2alpha (eIF2alpha) to inhibit overall protein synthesis. Paradoxically, this leads to translation of a subset of mRNAs, like transcription factor ATF4, which in turn induces transcription of downstream stress-induced genes such as growth arrest DNA-inducible gene 34 (GADD34). GADD34 interacts with protein phosphatase 1 to dephosphorylate eIF2alpha, resulting in a negative feedback loop to recover protein synthesis and allow translation of stress-induced transcripts. Here, we show that GADD34 is not only transcriptionally induced but also translationally regulated to ensure maximal expression during eIF2alpha phosphorylation. GADD34 mRNAs are preferentially associated with polysomes during eIF2alpha phosphorylation, which is mediated by its 5'-untranslated region (5'UTR). The human GADD34 5'UTR contains two non-overlapping upstream open reading frames (uORFs), whereas the mouse version contains two overlapping and out of frame uORFs. Using 5'UTR GADD34 reporter constructs, we show that the downstream uORF mediates repression of basal translation and directs translation during eIF2alpha phosphorylation. Furthermore, we show that the upstream uORF is poorly translated and that a proportion of scanning ribosomes bypasses the upstream uORF to recognize the downstream uORF. These findings suggest that GADD34 translation is regulated by a unique 5'UTR uORF mechanism to ensure proper GADD34 expression during eIF2alpha phosphorylation. This mechanism may serve as a model for understanding how other 5'UTR uORF-containing mRNAs are regulated during cellular stress.
Project description:The 5'-UTR of the actin-related protein 2/3 complex subunit 2 (ARPC2) mRNA exists in two variants. Using a bicistronic reporter construct, the present study demonstrates that the longer variant of the 5'-UTR harbours an internal ribosome entry site (IRES) which is lacking in the shorter one. Multiple control assays confirmed that only this variant promotes cap-independent translation. Furthermore, it includes a guanine-rich region that is capable of forming a guanine-quadruplex (G-quadruplex) structure which was found to contribute to the IRES activity. To investigate the cellular function of the IRES element, we determined the expression level of ARPC2 at various cell densities. At high cell density, the relative ARPC2 protein level increases, supporting the presumed function of IRES elements in driving the expression of certain genes under stressful conditions that compromise cap-dependent translation. Based on chemical probing experiments and computer-based predictions, we propose a structural model of the IRES element, which includes the G-quadruplex motif exposed from the central stem-loop element. Taken together, our study describes the functional relevance of two alternative 5'-UTR splice variants of the ARPC2 mRNA, one of which contains an IRES element with a G-quadruplex as a central motif, promoting translation under stressful cellular conditions.