ABSTRACT: TrEMOLO : Accurate transposable element allele frequency estimation using long-read sequencing data combining assembly and mapping-based approaches
Project description:Transposable elements, known colloquially as “jumping genes,” constitute approximately 45% of the human genome. Cells utilize epigenetic defenses to limit transposable element jumping, including formation of silencing heterochromatin and generation of piwi-interacting RNAs (piRNAs), small RNAs that facilitate clearance of transposable element transcripts. Here we identify transposable element activation as a key mediator of neuronal death in tauopathies, a group of neurodegenerative disorders, including Alzheimer’s disease, that are pathologically characterized by deposits of tau protein in the brain. Mechanistically, we find that heterochromatin decondensation and reduction of piwi/piRNAs drive transposable element activation in tauopathy. Using genetic and pharmacological approaches in a Drosophila melanogaster model of tauopathy, we provide evidence for a causal relationship between pathogenic tau-induced heterochromatin decondensation, piwi/piRNA depletion, active transposable element obilization, and neurodegeneration. We further report a significant increase in transcripts of the endogenous retrovirus class of transposable elements in human Alzheimer’s disease and progressive supranuclear palsy, suggesting that transposable element dysregulation is conserved in human tauopathy. Taken together, our data identify heterochromatin decondensation, piwi/piRNA depletion and consequent transposable element activation as a novel, pharmacologically targetable, mechanistic driver of neurodegeneration in tauopathy.
Project description:Optimal brain function requires that neurons carry out extensive post-transcriptional RNA processing to produce a vast diversity of transcripts. Accurate reconstruction and quantification of highly processed RNA using standard RNA sequencing approaches is challenging due to their short read lengths. Long-read direct RNA sequencing can resolve multiple variations within RNA isoforms by capturing full-length transcripts spanning multiple exon-exon junctions, repetitive regions (e.g. retrotransposons), and intronic structures. Here we produce an isoform-level map of post-transcriptional RNA modifications using Oxford Nanopore Technologies (ONT) long-read sequencing of native RNA strands extracted from heads of Drosophila melanogaster aged to day 10 of adulthood. In addition to identifying 930 transcripts that are not present in the reference transcriptome, we find that almost half of the total detected isoforms have polyadenylated tails in excess of 104 nucleotides and that over 59% of transcripts possessed detectable m6A-modified bases. RNA modifications are present in RNA transcribed from transposable elements, which are important drivers of genetic diversity and relevant to human neurodegenerative diseases, including Alzheimer’s disease and related tauopathies. Applying nanopore direct RNA sequencing to a Drosophila model of tauopathy with known transposable element activation and various types of errors in RNA handling reveals exceptionally diverse RNA processing events in regions that are considered difficult to characterize with traditional short-read sequencing. Taken together, we have uncovered complex transcript structures in adult Drosophila head in a physiological setting and in the context of tauopathy, laying the groundwork for future studies to characterize the diverse tau transcriptome in brain tissue from patients with Alzheimer’s disease and related tauopathies.
Project description:With an ability to compromise genome integrity, transposable elements (TEs) have significant associations with human diseases. Short-read sequencing has been used to study the expression of TEs; however, the highly repetitive nature of these elements makes multimapping a critical issue. Here we implement LocusMasterTE, an improved quantification method by integrating long-read sequencing. Introducing computed transcript per million(TPM) counts from long-read sequencing as prior distribution during Expectation-Maximization(EM) model in short-read TE quantification, multi-mapped reads are re-assigned to correct expression values. Based on simulated short reads, LocusMasterTE outperforms current quantitative approaches and is significantly favorable in capturing newly inserted TEs. We also verified that TEs quantified by LocusMasterTE clearly related to euchromatins and heterochromatins in cell line samples. With LocusMasterTE we anticipate that more accurate quantification can be performed, allowing novel functions of TEs to be uncovered.
Project description:With an ability to compromise genome integrity, transposable elements (TEs) have significant associations with human diseases. Short-read sequencing has been used to study the expression of TEs; however, the highly repetitive nature of these elements makes multimapping a critical issue. Here we implement LocusMasterTE, an improved quantification method by integrating long-read sequencing. Introducing computed transcript per million(TPM) counts from long-read sequencing as prior distribution during Expectation-Maximization(EM) model in short-read TE quantification, multi-mapped reads are re-assigned to correct expression values. Based on simulated short reads, LocusMasterTE outperforms current quantitative approaches and is significantly favorable in capturing newly inserted TEs. We also verified that TEs quantified by LocusMasterTE clearly related to euchromatins and heterochromatins in cell line samples. With LocusMasterTE we anticipate that more accurate quantification can be performed, allowing novel functions of TEs to be uncovered.
Project description:We have used the genetic resources of Arabidopsis thaliana to generate mutant lines that have reactivated TE expression. We used these lines with long-read Oxford Nanopore sequencing technology to capture Transposable Element (TE) mRNAs for TE transcript annotation.
Project description:Ongoing improvements to next generation sequencing technologies are leading to longer sequencing read lengths, but a thorough understanding of the impact of longer reads on RNA sequencing analyses is lacking. To address this issue, we generated and compared two RNA sequencing datasets of differing read lengths -- 2x75 bp (L75) and 2x262 bp (L262) -- and investigated the impact of read length on various aspects of analysis, including the performance of currently available read-mapping tools, gene and transcript quantification, and detection of allele-specific expression patterns. Our results indicate that, while the scalability of read-mapping tools and the cost-effectiveness of long read protocol is an issue that requires further attention, longer reads enable more accurate quantification of diverse aspects of gene expression, including individual-specific patterns of allele-specific expression and alternative splicing. Two RNA-Seq datasets of differing read lengths (2x262 bp and 2x75 bp)