Project description:Open reading frames (ORFs) are the genomic DNA sequences that have the potential to be translated. Genome annotation pipelines dismiss translation products of small ORFs (smORFs) of less than 100 codons (<300 nucleotides) as being unlikely to have a biological function. In this study, we systematically characterized smORFs in mouse B and T cells under different conditions and predicted a total of 5744 unique actively translated smORFs. We then extended our analysis to ORFs of 101-200 codons in length and predicted 945 of such longer translation products. Additionally, our results have suggested the existence of candidate secreted micropeptides. Furthermore, verifying their existence and identifying their functions will be essential and potentially lead to useful applications.
Project description:We present a genome-wide assessment of small open reading frames (smORF) translation by ribosomal profiling of polysomal fractions in Drosophila S2 cell. In this way, mRNAs bound by multiple ribosomes and hence actively translated can be isolated and distinguished from mRNAs bound by sporadic, putatively non-productive single ribosomes or ribosomal subunits. Ribosomal profiling of large and small polysomal fractions in Drosophila S2 cells to assess translation of smORFs
Project description:We present a genome-wide assessment of small open reading frames (smORF) translation by ribosomal profiling of polysomal fractions in Drosophila S2 cell. In this way, mRNAs bound by multiple ribosomes and hence actively translated can be isolated and distinguished from mRNAs bound by sporadic, putatively non-productive single ribosomes or ribosomal subunits.
Project description:Accurate annotations of protein coding regions are essential for understanding how genetic information is translated into biological functions. The recent development of ribosome footprint profiling provides an important new tool for measuring translation. Here we describe riboHMM, a new method that uses ribosome footprint data along with gene expression and sequence information to accurately infer translated sequences. We applied our method to human lymphoblastoid cell lines and identified 7,863 previously unannotated coding sequences, including 445 translated sequences in pseudogenes and 2,442 translated upstream open reading frames. We observed an enrichment of harringtonine-treated ribosome footprints at the inferred initiation sites, validating many of the novel coding sequences. In aggregate, the novel sequences exhibit significant signatures of purifying selection indicative of protein-coding function, suggesting that many of the novel sequences are functional. We observed that nearly 40% of bicistronic transcripts showed significant negative correlation in the levels of translation of their two coding sequences, suggesting a key regulatory role for these novel translated sequences. Despite evidence for their functional importance, the novel peptide sequences were detected by mass spectrometry at a lower rate than predicted based on data from annotated proteins, thus suggesting that many of the novel peptide products may be relatively short-lived. Our work illustrates the value of ribosome profiling for improving coding annotations, and significantly expands the set of known coding regions.
Project description:Protein-coding small open reading frames (smORFs) are emerging as an important class of genes, however, the coding capacity of smORFs in the human genome is unclear. By integrating de novo transcriptome assembly and Ribo-Seq, we confidently annotate thousands of novel translated smORFs in three human cell lines. We find that smORF translation prediction is noisier than for annotated coding sequences, underscoring the importance of analyzing multiple experiments and footprinting conditions. These smORFs are located within non-coding and antisense transcripts, the UTRs of mRNAs, and unannotated transcripts. Analysis of RNA levels and translation efficiency during cellular stress identifies regulated smORFs and provides an approach for identifying smORFs for further investigation. Sequence conservation and signatures of positive selection indicate that encoded microproteins are likely functional. Additionally, proteomics data from enriched human leukocyte antigen complexes validates the translation of hundreds of smORFs and positions them as a source of novel antigens. Thus, smORFs represent a significant number of important, yet unexplored human genes.
Project description:Ribosome profiling has revealed pervasive but largely uncharacterized translation outside of canonical coding sequences (CDSs). Here, we exploit a systematic CRISPR-based screening strategy to identify hundreds of non-canonical CDSs that are essential for cellular growth and whose disruption elicit specific, robust transcriptomic and phenotypic changes in human cells. Functional characterization of the encoded microproteins reveals distinct cellular localizations, specific protein binding partners, and hundreds that are presented by the HLA system. Interestingly, we find multiple microproteins encoded in upstream open reading frames, which form stable complexes with the main, canonical protein encoded on the same mRNA, thus revealing the diverse use of functional bicistronic operons in mammals. Together, our results point to a family of functional human microproteins that play critical and diverse cellular roles.
Project description:All species continuously evolve small open reading frames (sORFs) that can be templated for protein synthesis and may provide raw material for evolutionary adaptation. We analyzed the evolutionary origins of 7,264 recently cataloged human sORFs and found that most were evolutionary young and emerged de novo. We additionally discovered 221 yet uncataloged peptides smaller than 16 amino acids, for which we found evidence of translation in human and rodent tissues. To investigate the potential bioactivity of microproteins and peptides translated from young and very small ORFs we established MiPRISMA: a mass spectrometry-based interactome screen with sequence motif resolution. We assessed 266 candidates and implicated several in essential cellular processes including splicing, translational regulation, and endocytosis, a subset of which we validate in cellular assays. MiPRISMA provides a scalable platform to characterize small and evolutionarily young proteins, shedding light on a largely unexplored territory of the putative human proteome.