Project description:Genomic sequences with high sequence similarity, such as parent-pseudogene pairs, cause short sequencing reads to align to multiple locations, thus complicating genomic analyses. However, their impact on transcriptomic analyses, including the estimation of gene expression and transcript annotation, has been less studied. Here, we investigated the impact of pseudogenes on transcriptomic analyses.
Project description:We analyzed transcriptomic data from infected and uninfected T-cells to identify pseudogenes and their parent genes showing differential expression in HIV-1 infection
Project description:Human genome encodes >14,000 pseudogenes that are evolutionary relics and have long been considered as nonfunctional genomic elements. Emerging evidence suggests that pseudogene can exert important regulatory function. However, function of most pseudogenes remains unknown. To fill this gap, we developed an integrated computational pipeline and performed to date the first set of pseudogene-focused CRISPRi screens in human cells. Our screens identified >100 pseudogenes that are important for cell fitness, with a more cell-type specific function compared to parent genes. In addition, we discovered a cancer-testis unitary pseudogene MGAT4EP that interacts with FOXA1, a key regulator in luminal A breast cancer.
Project description:We analyzed transcriptomic data from infected and uninfected T-cells to identify pseudogenes and their parent genes showing differential expression in HIV-1 infection H9 T-cell line was infected with NL4-3 strain of HIV-1 obtained by transfection of 293T cells. RNA from infected and uninfected cells was extracted 7 days post infection.
Project description:The human genome harbors 15,000 pseudogenes, except very few can transcribe non-coding RNAs or encode truncated proteins, a large number of which without transcriptional capacity are functional unknown. Here, we found that DNA sequence of pseudogenes can form chromatin contacts, acting as anchors of chromatin loops and boundaries of topological associated domains (TADs), many of them are proved to be structural important and essential for human embryonic stem cells (hESCs) survival. Incorporating genetic data, we defined a hominoidea-specific pseudogene, TUBBP2, which acted as a TAD boundary by enriching CTCF, to maintain the 3D genome and self-renewal of hESCs. Evolutionally, TUBBP2 was generated in 18.8 million years ago through retroposition and inherited the CTCF binding motif from its parent gene-TUBB, thus enhance the strength of TADs at the insertion site in great apes genome. More amazing, by inheritance from parent genes or sequence variation, a part of pseudogenes can introduce additional CTCF binding sequence at their insertion sites to generate species-specific topological domains, which may contribute to species evolution. Overall, we not only demonstrate the essentiality of pseudogenes in the formation and maintenance of 3D chromatin structure, but provide insights on their functions of driving species evolution.
Project description:The human genome harbors 15,000 pseudogenes, except very few can transcribe non-coding RNAs or encode truncated proteins, a large number of which without transcriptional capacity are functional unknown. Here, we found that DNA sequence of pseudogenes can form chromatin contacts, acting as anchors of chromatin loops and boundaries of topological associated domains (TADs), many of them are proved to be structural important and essential for human embryonic stem cells (hESCs) survival. Incorporating genetic data, we defined a hominoidea-specific pseudogene, TUBBP2, which acted as a TAD boundary by enriching CTCF, to maintain the 3D genome and self-renewal of hESCs. Evolutionally, TUBBP2 was generated in 18.8 million years ago through retroposition and inherited the CTCF binding motif from its parent gene-TUBB, thus enhance the strength of TADs at the insertion site in great apes genome. More amazing, by inheritance from parent genes or sequence variation, a part of pseudogenes can introduce additional CTCF binding sequence at their insertion sites to generate species-specific topological domains, which may contribute to species evolution. Overall, we not only demonstrate the essentiality of pseudogenes in the formation and maintenance of 3D chromatin structure, but provide insights on their functions of driving species evolution.
Project description:The human genome harbors 15,000 pseudogenes, except very few can transcribe non-coding RNAs or encode truncated proteins, a large number of which without transcriptional capacity are functional unknown. Here, we found that DNA sequence of pseudogenes can form chromatin contacts, acting as anchors of chromatin loops and boundaries of topological associated domains (TADs), many of them are proved to be structural important and essential for human embryonic stem cells (hESCs) survival. Incorporating genetic data, we defined a hominoidea-specific pseudogene, TUBBP2, which acted as a TAD boundary by enriching CTCF, to maintain the 3D genome and self-renewal of hESCs. Evolutionally, TUBBP2 was generated in 18.8 million years ago through retroposition and inherited the CTCF binding motif from its parent gene-TUBB, thus enhance the strength of TADs at the insertion site in great apes genome. More amazing, by inheritance from parent genes or sequence variation, a part of pseudogenes can introduce additional CTCF binding sequence at their insertion sites to generate species-specific topological domains, which may contribute to species evolution. Overall, we not only demonstrate the essentiality of pseudogenes in the formation and maintenance of 3D chromatin structure, but provide insights on their functions of driving species evolution.
Project description:Pseudogenes are gene copies presumed to mainly be functionless relics of evolution due to acquired deleterious mutations or transcriptional silencing. When transcribed, pseudogenes may encode proteins or enact RNA-intrinsic regulatory mechanisms. However, the extent, characteristics and functional relevance of the human pseudogene transcriptome are unclear. Short-read sequencing platforms have limited power to resolve and accurately quantify pseudogene transcripts owing to the high sequence similarity of pseudogenes and their parent genes. Using deep full-length PacBio cDNA sequencing of normal human tissues and cancer cell lines, we identify here hundreds of novel transcribed pseudogenes. Pseudogene transcripts are expressed in tissue-specific patterns, exhibit complex splicing patterns and contribute to the coding sequences of known genes. We survey pseudogene transcripts encoding intact open reading frames (ORFs), representing potential unannotated protein-coding genes, and demonstrate their efficient translation in cultured cells. To assess the impact of noncoding pseudogenes on the cellular transcriptome, we delete the nucleus-enriched pseudogene PDCL3P4 transcript from HAP1 cells and observe hundreds of perturbed genes. This study highlights pseudogenes as a complex and dynamic component of the transcriptional landscape underpinning human biology and disease.
Project description:The human genome harbors 15,000 pseudogenes, except very few can transcribe non-coding RNAs or encode truncated proteins, a large number of which without transcriptional capacity are functionally unknown. Here, we proposed and verified that pseudogene DNA sequences can form chromatin contacts that act as anchors of chromatin loops and boundaries of topologically associating domains (TADs). More amazing, due to the sequence-specificity of pseudogenes, TAD boundaries containing them in the human genome were primarily species-specific, including human-specific and primate-specific ones. We found that during primate evolution, by inheritance from parent genes, certain pseudogenes can introduce additional transcription factor-binding sequences, especially CTCF, at their insertion sites to generate species-specific TAD boundaries, and due to the participation in TAD evolution, CTCF binding motifs on pseudogenes were subjected to significantly heightened selection pressure. Deleting these pseudogenes in human embryonic stem cells (hESCs) disrupted the structure of species-specific TADs, whereas inserting them in mouse embryonic stem cells (mESCs) mediated new TADs formation, and pseudogenes involved in three-dimensional (3D) genome formation were critical for maintaining hESCs self-renewal. The structural necessity and biological function of these pseudogenes demonstrated the broad significance of pseudogenes in 3D genome construction and evolution.
Project description:The human genome harbors 15,000 pseudogenes, except very few can transcribe non-coding RNAs or encode truncated proteins, a large number of which without transcriptional capacity are functionally unknown. Here, we proposed and verified that pseudogene DNA sequences can form chromatin contacts that act as anchors of chromatin loops and boundaries of topologically associating domains (TADs). More amazing, due to the sequence-specificity of pseudogenes, TAD boundaries containing them in the human genome were primarily species-specific, including human-specific and primate-specific ones. We found that during primate evolution, by inheritance from parent genes, certain pseudogenes can introduce additional transcription factor-binding sequences, especially CTCF, at their insertion sites to generate species-specific TAD boundaries, and due to the participation in TAD evolution, CTCF binding motifs on pseudogenes were subjected to significantly heightened selection pressure. Deleting these pseudogenes in human embryonic stem cells (hESCs) disrupted the structure of species-specific TADs, whereas inserting them in mouse embryonic stem cells (mESCs) mediated new TADs formation, and pseudogenes involved in three-dimensional (3D) genome formation were critical for maintaining hESCs self-renewal. The structural necessity and biological function of these pseudogenes demonstrated the broad significance of pseudogenes in 3D genome construction and evolution.