Project description:More than 40% of the human genome is generated by retrotransposition, a series of in vivo processes involving reverse transcription of RNA molecules and integration of the transcripts into the genomic sequence. The mechanism of retrotransposition, however, is not fully understood, and additional genomic elements generated by retrotransposition may remain to be discovered. Here, we report that the human genome contains many previously unidentified short pseudogenes generated by retrotransposition of mRNAs. Genomic elements generated by non-long terminal repeat retrotransposition have specific sequence signatures: a poly-A tract that is immediately downstream and a pair of duplicated sequences, called target site duplications (TSDs), at either end. Using a new computer program, TSDscan, that can accurately detect pseudogenes based on the presence of the poly-A tract and TSDs, we found 654 short (< or = 300 bp), previously unknown pseudogenes derived from mRNAs. Comprehensive analyses of the pseudogenes that we identified and their parent mRNAs revealed that the pseudogene length depends on the parent mRNA length: long mRNAs generate more short pseudogenes than do short mRNAs. To explain this phenomenon, we hypothesize that most long mRNAs are truncated before they are reverse transcribed. Truncated mRNAs would be rapidly degraded during reverse transcription, resulting in the generation of short pseudogenes.
Project description:Circular RNAs (circRNAs) are covalently closed non-coding RNAs lacking the 5' cap and the poly-A tail. Nevertheless, it has been demonstrated that certain circRNAs can undergo active translation. Therefore, aberrantly expressed circRNAs in human cancers could be an unexplored source of tumor-specific antigens, potentially mediating anti-tumor T cell responses. This study presents an immunopeptidomics workflow with a specific focus on generating a circRNA-specific protein fasta reference. The main goal of this workflow is to streamline the process of identifying and validating human leukocyte antigen (HLA) bound peptides potentially originating from circRNAs. We increase the analytical stringency of our workflow by retaining peptides identified independently by two mass spectrometry search engines and/or by applying a group-specific FDR for canonical-derived and circRNA-derived peptides. A subset of circRNA-derived peptides specifically encoded by the region spanning the back-splice junction (BSJ) are validated with targeted MS, and with direct Sanger sequencing of the respective source transcripts. Our workflow identifies 54 unique BSJ-spanning circRNA-derived peptides in the immunopeptidome of melanoma and lung cancer samples. Our approach enlarges the catalog of source proteins that can be explored for immunotherapy.
Project description:Pseudogenes are generally considered to be non-functional DNA sequences that arise through nonsense or frame-shift mutations of protein-coding genes. Although certain pseudogene-derived RNAs have regulatory roles, and some pseudogene fragments are translated, no clear functions for pseudogene-derived proteins are known. Olfactory receptor families contain many pseudogenes, which reflect low selection pressures on loci no longer relevant to the fitness of a species. Here we report the characterization of a pseudogene in the chemosensory variant ionotropic glutamate receptor repertoire of Drosophila sechellia, an insect endemic to the Seychelles that feeds almost exclusively on the ripe fruit of Morinda citrifolia. This locus, D. sechellia Ir75a, bears a premature termination codon (PTC) that appears to be fixed in the population. However, D. sechellia Ir75a encodes a functional receptor, owing to efficient translational read-through of the PTC. Read-through is detected only in neurons and is independent of the type of termination codon, but depends on the sequence downstream of the PTC. Furthermore, although the intact Drosophila melanogaster Ir75a orthologue detects acetic acid-a chemical cue important for locating fermenting food found only at trace levels in Morinda fruit-D. sechellia Ir75a has evolved distinct odour-tuning properties through amino-acid changes in its ligand-binding domain. We identify functional PTC-containing loci within different olfactory receptor repertoires and species, suggesting that such 'pseudo-pseudogenes' could represent a widespread phenomenon.
Project description:Recent advances in the analysis of RNA sequencing data have shown that pseudogenes are highly specific markers of cell identity and can be used as diagnostic and prognostic markers. Furthermore, genetically engineered mouse models have recently provided compelling support for a causal link between altered pseudogene expression and cancer. In this review, we discuss the most recent milestones reached in the pseudogene field and the use of pseudogenes as cancer classifiers.
Project description:Unlike linear RNAs terminated with 5' caps and 3' tails, circular RNAs are characterized by covalently closed loop structures with neither 5' to 3' polarity nor polyadenylated tail. This intrinsic characteristic has led to the general under-estimation of the existence of circular RNAs in previous polyadenylated transcriptome analyses. With the advent of specific biochemical and computational approaches, a large number of circular RNAs from back-spliced exons (circRNAs) have been identified in various cell lines and across different species. Recent studies have uncovered that back-splicing requires canonical spliceosomal machinery and can be facilitated by both complementary sequences and specific protein factors. In this review, we highlight our current understanding of the regulation of circRNA biogenesis, including both the competition between splicing and back-splicing and the previously under-appreciated alternative circularization.
Project description:By definition, pseudogenes are relics of former genes that no longer possess biological functions. Operationally, they are identified based on disruptions of open reading frames (ORFs) or presumed losses of promoter activities. Intriguingly, a recent human proteomic study reported peptides encoded by 107 pseudogenes. These peptides may play currently unrecognized physiological roles. Alternatively, they may have resulted from accidental translations of pseudogene transcripts and possess no function. Comparing between human and macaque orthologs, we show that the nonsynonymous to synonymous substitution rate ratio (ω) is significantly smaller for translated pseudogenes than other pseudogenes. In particular, five of 34 translated pseudogenes amenable to evolutionary analysis have ω values significantly lower than 1, indicative of the action of purifying selection. This and other findings demonstrate that some but not all translated pseudogenes have selected functions at the protein level. Hence, neither ORF disruption nor presence of protein product disproves or proves gene functionality at the protein level.
Project description:Functional genomics has provided evidence that the human genome transcribes a large number of non-coding genes in addition to protein-coding genes, including microRNAs and long non-coding RNAs (lncRNAs). Among the group of lncRNAs are pseudogenes that have not been paid attention in the past, compared to other members of lncRNAs. However, increasing evidence points the important role of pseudogenes in diverse cellular functions, and dysregulation of pseudogenes are often associated with various human diseases including cancer. Like other types of lncRNAs, pseudogenes can also function as master regulators for gene expression and thus, they can play a critical role in various aspects of tumorigenesis. In this review we discuss the latest developments in pseudogene research, focusing on how pseudogenes impact tumorigenesis through different gene regulation mechanisms. Given the high sequence homology with the corresponding parent genes, we also discuss challenges for pseudogene research.
Project description:Human genome encodes >14,000 pseudogenes that are evolutionary relics and have long been considered as nonfunctional genomic elements. Emerging evidence suggests that pseudogene can exert important regulatory function. However, function of most pseudogenes remains unknown. To fill this gap, we developed an integrated computational pipeline and performed to date the first set of pseudogene-focused CRISPRi screens in human cells. Our screens identified >100 pseudogenes that are important for cell fitness, with a more cell-type specific function compared to parent genes. In addition, we discovered a cancer-testis unitary pseudogene MGAT4EP that interacts with FOXA1, a key regulator in luminal A breast cancer.
Project description:Circular RNAs (circRNAs) are covalently closed non-coding RNAs lacking the 5’ cap and the poly-A tail. Nevertheless, it has been demonstrated that certain circRNAs can undergo active translation. Therefore, aberrantly expressed circRNAs in human cancers could be an unexplored source of tumor-specific antigens, potentially mediating anti-tumor T cell responses. This study presents an immunopeptidomics workflow with a specific focus on generating a circRNA-specific protein fasta reference. The main goal of this workflow is to streamline the process of identifying and validating human leukocyte antigen (HLA) bound peptides potentially originating from circRNAs. We increased the analytical stringency of our workflow by retaining peptides identified independently by two mass spectrometry search engines and/or by applying a group-specific FDR for canonical-derived and circRNA-derived peptides. A subset of circRNA-derived peptides specifically encoded by the region spanning the back-splice junction (BSJ) were validated with targeted MS, and with direct Sanger sequencing of the respective source transcripts. Our workflow identified 54 unique BSJ-spanning circRNA-derived peptides in the immunopeptidome of melanoma and lung cancer samples. Our novel approach enlarges the catalog of source proteins that can be explored for immunotherapy.
Project description:Despite the diversity of liquid biopsy transcriptomic repertoire, numerous studies often exploit only a single RNA type signature for diagnostic biomarker potential. This frequently results in insufficient sensitivity and specificity necessary to reach diagnostic utility. Combinatorial biomarker approaches may offer a more reliable diagnosis. Here, we investigated the synergistic contributions of circRNA and mRNA signatures derived from blood platelets as biomarkers for lung cancer detection. We developed a comprehensive bioinformatics pipeline permitting an analysis of platelet-circRNA and mRNA derived from non-cancer individuals and lung cancer patients. An optimal selected signature is then used to generate the predictive classification model using machine learning algorithm. Using an individual signature of 21 circRNA and 28 mRNA, the predictive models reached an area under the curve (AUC) of 0.88 and 0.81, respectively. Importantly, combinatorial analysis including both types of RNAs resulted in an 8-target signature (6 mRNA and 2 circRNA), enhancing the differentiation of lung cancer from controls (AUC of 0.92). Additionally, we identified five biomarkers potentially specific for early-stage detection of lung cancer. Our proof-of-concept study presents the first multi-analyte-based approach for the analysis of platelets-derived biomarkers, providing a potential combinatorial diagnostic signature for lung cancer detection.