Project description:In recent years, long-read sequencing technologies have detected transcript isoforms with unprecedented accuracy and resolution. However, it remains unclear whether long-read sequencing can effectively disentangle the isoform landscape of complex allele-specific loci that arise from genetic or epigenetic differences between alleles. Here, we combine the PacBio Iso-Seq workflow with the established phasing approach WhatsHap to assign long reads to the corresponding allele in polymorphic F1 mouse hybrids. Upon comparing the long-read sequencing results with matched short reads, we observed general consistency in the allele-specific information and were able to confirm the imprinting status of known imprinted genes. We then explored the complex imprinted Gnas locus known for allele-specific non-coding and coding isoforms and were able to benchmark historical observations. This approach also allowed us to detect isoforms from both the active and inactive X chromosomes of genes that escape X chromosome inactivation. The described workflow offers a promising framework and demonstrates the power of long-read transcriptomic data to provide mechanistic insight into complex allele-specific loci.
Project description:Large-scale sequencing of RNAs from individual cells can reveal patterns of gene, isoform and allelic expression across cell types and states. However, current single-cell RNA-sequencing (scRNA-seq) methods have limited ability to count RNAs at allele- and isoform resolution, and long-read sequencing techniques lack the depth required for large-scale applications across cells. Here, we introduce Smart-seq3 that combines full-length transcriptome coverage with a 5’ unique molecular identifier (UMI) RNA counting strategy that enabled in silico reconstruction of thousands of RNA molecules per cell. Importantly, a large portion of counted and reconstructed RNA molecules could be directly assigned to specific isoforms and allelic origin, and we identified significant transcript isoform regulation in mouse strains and human cell types. Moreover, Smart-seq3 showed a dramatic increase in sensitivity and typically detected thousands more genes per cell than Smart-seq2. Altogether, we developed a short-read sequencing strategy for single-cell RNA counting at isoform and allele-resolution applicable to large-scale characterization of cell types and states across tissues and organisms.
Project description:Accurate quantification of transcript isoforms is crucial for understanding gene regulation, functional diversity, and cellular behavior. Existing methods using either short-read (SR) or long-read (LR) RNA sequencing have significant limitations: SR sequencing provides high depth but struggles with isoform deconvolution, while LR sequencing offers isoform resolution at the cost of lower depth, higher noise, and technical biases. Addressing this gap, we introduce Multi-Platform Aggregation and Quantification of Transcripts (MPAQT), a generative model that combines the complementary strengths of different sequencing platforms to achieve state-of-the-art isoform-resolved transcript quantification, as demonstrated by extensive simulations and experimental benchmarks. Applying MPAQT to an in vitro model of human embryonic stem cell differentiation into cortical neurons, followed by machine learning-based modeling of mRNA abundance determinants, reveals the role of untranslated regions (UTRs) in isoform regulation through isoform-specific interactions with RNA-binding proteins that modulate mRNA stability. These findings highlight MPAQT's potential to enhance our understanding of transcriptomic complexity and underline the role of splicing-independent post-transcriptional mechanisms in shaping the isoform and exon usage landscape of the cell.
Project description:Accurate quantification of transcript isoforms is crucial for understanding gene regulation, functional diversity, and cellular behavior. Existing methods using either short-read (SR) or long-read (LR) RNA sequencing have significant limitations: SR sequencing provides high depth but struggles with isoform deconvolution, while LR sequencing offers isoform resolution at the cost of lower depth, higher noise, and technical biases. Addressing this gap, we introduce Multi-Platform Aggregation and Quantification of Transcripts (MPAQT), a generative model that combines the complementary strengths of different sequencing platforms to achieve state-of-the-art isoform-resolved transcript quantification, as demonstrated by extensive simulations and experimental benchmarks. Applying MPAQT to an in vitro model of human embryonic stem cell differentiation into cortical neurons, followed by machine learning-based modeling of mRNA abundance determinants, reveals the role of untranslated regions (UTRs) in isoform regulation through isoform-specific interactions with RNA-binding proteins that modulate mRNA stability. These findings highlight MPAQT's potential to enhance our understanding of transcriptomic complexity and underline the role of splicing-independent post-transcriptional mechanisms in shaping the isoform and exon usage landscape of the cell.
Project description:In this study, we used a barcoding-based synthetic long read (SLR) isoform sequencing approach (LoopSeq) to generate sequencing reads sufficiently long and accurate to identify isoforms using standard short read Illumina sequencers.
Project description:In this study, we used a barcoding-based synthetic long read (SLR) isoform sequencing approach (LoopSeq) to generate sequencing reads sufficiently long and accurate to identify isoforms using standard short read Illumina sequencers.
Project description:In this study, we used a barcoding-based synthetic long read (SLR) isoform sequencing approach (LoopSeq) to generate sequencing reads sufficiently long and accurate to identify isoforms using standard short read Illumina sequencers.
Project description:The goal of this project was to perform long-read RNA sequencing (LR-seq, PacBio) in combination with short-read RNA-seq for systematic characterization of the isoform diversity in primary breast tumor samples. We sequenced the full-length transcriptomes of 26 breast tumors and 4 normal breast samples.
Project description:Transcription and translation are intertwined processes where mRNA isoforms are crucial intermediaries. However, methodological limitations in analyzing translation at the mRNA isoform level have impaired our ability to comprehensively establish links between the full-length transcripts and the translatome. This has left gaps in our understanding of critical biological processes, regulatory mechanisms, and disease progression. To address this, we develop an integrated computational and experimental framework called long-read Ribo-STAMP (LR-Ribo-STAMP). LR-Ribo-STAMP capitalizes on advancements in long-read sequencing and RNA-base editing-mediated technologies to simultaneously and scalably profile translation and transcription at both gene and mRNA isoform levels for the first time. In this report, we show agreement between gene-level translation profiles obtained with LR-Ribo-STAMP and those from previously validated short-read Ribo-STAMP data in unperturbed cells. At the mRNA isoform level, we show that LR-Ribo-STAMP successfully profiles translation in unperturbed cells and links mRNA isoforms and regulatory features, such as upstream ORFs (uORFs) and regulatory sequences, to translation measurements. We further demonstrate the method’s effectiveness in profiling disease models by profiling translation at gene and isoform levels in a triple-negative breast cancer cell line under normoxia and hypoxia. Here, we find that LR-Ribo-STAMP effectively delineates orthogonal transcriptional and translation shifts between conditions at gene and isoform levels. At the isoform level, LR-Ribo-STAMP uniquely identifies key regulatory elements and shifts in mRNA isoform transcription that correlate with changes in translational, providing an example of insight that can inform the development of novel therapeutics. Overall, LR-Ribo-STAMP is a significant advancement in translation methods and can have profound implications for basic research and clinical applications.
Project description:Next-generation sequencing has become an important tool for genome-wide quantification of DNA and RNA. However, a major technical hurdle lies in the need to map short sequence reads back to their correct locations in a reference genome. Here we investigate the impact of SNP variation on the reliability of read-mapping in the context of detecting allele-specific expression (ASE).We generated sixteen million 35 bp reads from mRNA of each of two HapMap Yoruba individuals. When we mapped these reads to the human genome we found that, at heterozygous SNPs, there was a significant bias towards higher mapping rates of the allele in the reference sequence, compared to the alternative allele. Masking known SNP positions in the genome sequence eliminated the reference bias but, surprisingly, did not lead to more reliable results overall. We find that even after masking, $\sim$5-10\% of SNPs still have an inherent bias towards more effective mapping of one allele. Filtering out inherently biased SNPs removes 40\% of the top signals of ASE. The remaining SNPs showing ASE are enriched in genes previously known to harbor cis-regulatory variation or known to show uniparental imprinting. Our results have implications for a variety of applications involving detection of alternate alleles from short-read sequence data. Scripts, written in Perl and R, for simulating short reads, masking SNP variation in a reference genome, and analyzing the simulation output are available upon request from JFD. RNA-Seq on two YRI Hapmap cell lines. Each individual sequenced on two lanes of the Illumina Genome Analyzer