Project description:XLalphas and ALEX are structurally unrelated mammalian proteins translated from alternative overlapping reading frames of a single transcript. Not only are they encoded by the same locus, but a specific XLalphas/ALEX interaction is essential for G-protein signaling in neuroendocrine cells. A disruption of this interaction leads to abnormal human phenotypes, including mental retardation and growth deficiency. The region of overlap between the two reading frames evolves at a remarkable speed: the divergence between human and mouse ALEX polypeptides makes them virtually unalignable. To trace the evolution of this puzzling locus, we sequenced it in apes, Old World monkeys, and a New World monkey. We show that the overlap between the two reading frames and the physical interaction between the two proteins force the locus to evolve in an unprecedented way. Namely, to maintain two overlapping protein-coding regions the locus is forced to have high GC content, which significantly elevates its intrinsic evolutionary rate. However, the two encoded proteins cannot afford to change too quickly relative to each other as this may impair their interaction and lead to severe physiological consequences. As a result XLalphas and ALEX evolve in an oscillating fashion constantly balancing the rates of amino acid replacements. This is the first example of a rapidly evolving locus encoding interacting proteins via overlapping reading frames, with a possible link to the origin of species-specific neurological differences.
Project description:Gene overlap occurs when two or more genes are encoded by the same nucleotides. This phenomenon is found in all taxonomic domains, but is particularly common in viruses, where it may provide a mechanism to increase the information content of compact genomes. The presence of overlapping reading frames (OvRFs) can skew estimates of selection based on the rates of non-synonymous and synonymous substitutions, since a substitution that is synonymous in one reading frame may be non-synonymous in another and vice versa. To understand the impact of OvRFs on molecular evolution, we implemented a versatile simulation model of nucleotide sequence evolution along a phylogeny with any distribution of open reading frames in linear or circular genomes. We use a custom data structure to track the substitution rates at every nucleotide site, which is determined by the stationary nucleotide frequencies, transition bias and the distribution of selection biases (dN/dS) in the respective reading frames. Our simulation model is implemented in the Python scripting language. All source code is released under the GNU General Public License version 3 and are available at https://github.com/PoonLab/HexSE.
Project description:The levels of telomeric proteins, such as telomerase, can have profound effects on telomere function, cell division and human disease. Here we demonstrate how levels of Stn1, a component of the conserved telomere capping CST (Cdc13, Stn1, Ten1) complex, are tightly regulated by an upstream overlapping open reading frame (oORF). In budding yeast inactivation of the STN1 oORF leads to a 10-fold increase in Stn1 levels, reduced telomere length, suppression of cdc13-1 and enhancement of yku70Δ growth defects. The STN1 oORF impedes translation of the main ORF and reduces STN1 mRNA via the nonsense mediated mRNA decay (NMD) pathway. Interestingly, the homologs of the translation re-initiation factors, MCT-1Tma20/DENRTma22 also reduce Stn1 levels via the oORF. Human STN1 also contains oORFs, which reduce expression, demonstrating that oORFs are a conserved mechanism for reducing Stn1 levels. Bioinformatic analyses of the yeast and human transcriptomes show that oORFs are more underrepresented than upstream ORFs (uORFs) and associated with lower protein abundance. We propose that oORFs are an important mechanism to control expression of a subset of the proteome.
Project description:The >1 kb XL-exon of the rat XLalphas/Galphas gene encodes the 37 kDa XL-domain, the N-terminal half of the 78 kDa neuroendocrine-specific G-protein alpha-subunit XLalphas. Here, we describe a novel feature of the XL-exon, the presence of an alternative >1 kb open reading frame (ORF) that completely overlaps with the ORF encoding the XL-domain. The alternative ORF starts 32 nucleotides downstream of the start codon for the XL-domain and is terminated by a stop codon exactly at the end of the XL-exon. The alternative ORF encodes ALEX, a very basic (pI 11.8), proline-rich protein of 356 amino acids. Both XLalphas and ALEX are translated from the same mRNA. Like XLalphas, ALEX is expressed in neuroendocrine cells and tightly associated with the cytoplasmic leaflet of the plasma membrane. Remarkably, ALEX binds to the XL-domain of XLalphas. Our results reveal a mechanism of gene usage that is without precedent in mammalian genomes.
Project description:A universal molybdenum-containing cofactor (MoCo) is essential for the activity of all human molybdoenzymes, including sulphite oxidase. The free cofactor is highly unstable, and all organisms share a similar biosynthetic pathway. The involved enzymes exhibit homologies, even between bacteria and humans. We have exploited these homologies to isolate a cDNA for the heterodimeric molybdopterin (MPT)-synthase. This enzyme is necessary for the conversion of an unstable precursor into molybdopterin, the organic moiety of MoCo. The corresponding transcript shows a bicistronic structure, encoding the small and large subunits of the MPT-synthase in two different open reading frames (ORFs) that overlap by 77 nucleotides. In various human tissues, only one size of mRNA coinciding with the bicistronic transcript was detected. In vitro translation and mutagenesis experiments demonstrated that each ORF is translated independently, leading to the synthesis of a 10-kDa protein and a 21-kDa protein for the small and large subunits, respectively, and indicated that the 3'-proximal ORF of the bicistronic transcript is translated by leaky scanning.
Project description:Understanding the dark genome is a priority task following the complete sequencing of the human genome. Short open reading frames (sORFs) are a group of largely unexplored elements of the dark genome with the potential for being translated into microproteins. The definitive number of coding and regulatory sORFs is not known, however they could account for up to 1-2% of the human genome. This corresponds to an order of magnitude in the range of canonical coding genes. For a few sORFs a clinical relevance has already been demonstrated, but for the majority of potential sORFs the biological function remains unclear. A major limitation in predicting their disease relevance using large-scale genomic data is the fact that no population-level constraint metrics for genetic variants in sORFs are yet available. To overcome this, we used the recently released gnomAD 4.0 dataset and analyzed the constraint of a consensus set of sORFs and their genomic neighbors. We demonstrate that sORFs are mostly embedded into a moderately constrained genomic context, but within the gencode dataset we identified a subset of highly constrained sORFs comparable to highly constrained canonical genes.
Project description:In this study, we show that the coronavirus (CoV) genome may encode many functional hydrophobic alpha-helical peptides (HAHPs) in overlapping reading frames of major coronaviral proteins throughout the entire viral genome. These HAHPs can theoretically be expressed from non-canonical sub-genomic (sg)RNAs that are synthesized in substantial amounts in infected cells. We selected and analyzed five and six HAHPs encoded in the S gene regions of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and Middle East respiratory syndrome coronavirus (MERS-CoV), respectively. Two and three HAHPs derived from SARS-CoV-2 and MERS-CoV, respectively, specifically interacted with both the SARS-CoV-2 and MERS-CoV S proteins and inhibited their membrane fusion activity. Furthermore, one of the SARS-CoV-2 HAHPs specifically inhibited viral RNA synthesis by accumulating at the site of viral RNA synthesis. Our data show that a group of HAHPs in the coronaviral genome potentially has a regulatory role in viral propagation.
Project description:Caliciviruses have positive-sense RNA genomes, typically with short 5'-untranslated regions (5'UTRs) that precede the long open reading frame 1 (ORF1). Exceptionally, some avian caliciviruses have long 5'UTRs containing a picornavirus-like internal ribosomal entry site (IRES), which was likely acquired by horizontal gene transfer. Here, we identified numerous additional avian calicivirus genomes with IRESs, predominantly type 2, and determined that many of these genomes contain a ~200-300 codon-long ORF (designated ORF1*) that overlaps the 5'-terminal region of ORF1. The activity of representative type 2 IRESs from grey teal calicivirus (GTCV) and Caliciviridae sp. isolate yc-13 (RaCV1) was confirmed by in vitro translation. Toeprinting showed that in cell-free extracts and in vitro reconstituted reactions, ribosomal initiation complexes assembled on the ORF1* initiation codon and at one or two AUG codons in ORF1 at the 3'-border and/or downstream of the IRES. Initiation at all three sites required eIF4A and eIF4G, which bound to a conserved region of the IRES; initiation on the ORF1* and principal ORF1 initiation codons involved eIF1/eIF1A-dependent scanning from the IRES's 3'-border. Initiation on these IRESs was enhanced by the IRES trans-acting factors (ITAFs) Ebp1/ITAF45, which bound to the apical subdomain Id of the IRES, and PTB (GTCV) or PCBP2 (RaCV1).
Project description:Mutants affected at the LYS5 locus of Yarrowia lipolytica lack detectable dehydrogenase (SDH) activity. The LYS5 gene has previously been cloned, and we present here the sequence of the 2.5-kilobase-pair (kb) DNA fragment complementing the lys5 mutation. Two large antiparallel open reading frames (ORF1 and ORF2) were observed, flanked by potential transcription signals. Both ORFs appear to be transcribed, but several lines of evidence suggest that only ORF2 is translated and encodes SDH. (i) The global amino acid compositions of Saccharomyces cerevisiae SDH and of the putative ORF2 product are similar and that of ORF1 is dissimilar. (ii) An in-frame translational fusion of ORF2 with the Escherichia coli lacZ gene was introduced into yeast cells and resulted in a beta-galactosidase activity regulated similarly to SDH; no beta-galactosidase activity was obtained with an in-frame fusion of ORF1 with lacZ. (iii) The introduction of a stop codon at the beginning of ORF2 prevented SDH expression in yeast cells, whereas no phenotypic effect was observed when ORF1 translation was blocked.
Project description:We engineered short ORFs and used them to control the expression level of recombinant proteins. These short ORFs, encoding a two-amino acid peptide, were placed upstream of an ORF encoding a protein of interest. Insertion of these upstream ORFs (uORFs) resulted in suppression of protein expression. By varying the base sequence preceding the uORF, we sought to vary the translation initiation rate of the uORF and subsequently control the degree of this suppression. Using this strategy, we generated a library of RNA sequence elements that can specify protein expression over a broad range of levels. By also using multiple uORFs in series and non-AUG start codons, we were able to generate particularly low expression levels, allowing us to achieve expression levels spanning three orders of magnitude. Modeling supported a mechanism where uORFs shunt the flow of ribosomes away from the downstream protein-coding ORF. With a lower translation initiation rate at the uORF, more ribosomes "leak" past the uORF; consequently, more ribosomes are able to reach and translate the downstream ORF. We report expression control by engineering uORFs and translation initiation to be robust, predictable, and reproducible across all cell types tested. We propose control of translation initiation as a primary method of choice for tuning expression in mammalian systems.