Project description:We detected fusion genes in 274 fresh surgical samples of gliomas using whole transcriptome sequencing. Using this approach we screened a panel of glioma samples and identified a number of activating novel fusion transcripts. Fusion detection in 274 glioma patients
Project description:Adenovirus is a common human pathogen that relies on host cell processes for transcription and processing of viral RNA and protein production. Although adenoviral promoters, splice junctions, and cleavage and polyadenylation sites have been characterized using low-throughput biochemical techniques or short read cDNA-based sequencing, these technologies do not fully capture the complexity of the adenoviral transcriptome. By combining Illumina short-read and nanopore long-read direct RNA sequencing approaches, we mapped transcription start sites and cleavage and polyadenylation sites across the adenovirus genome. In addition to confirming the known canonical viral early and late RNA cassettes, our analysis of splice junctions within long RNA reads revealed an additional 35 novel viral transcripts. These RNAs include fourteen new splice junctions which lead to expression of canonical open reading frames (ORF), six novel ORF-containing transcripts, and fifteen transcripts encoding for messages that potentially alter protein functions through truncations or fusion of canonical ORFs. In addition, we also detect RNAs that bypass canonical cleavage sites and generate potential chimeric proteins by linking separate gene transcription units. Of these, an evolutionary conserved protein was detected containing the N-terminus of E4orf6 fused to the downstream DBP/E2A ORF. Loss of this novel protein, E4orf6/DBP, was associated with aberrant viral replication center morphology and poor viral spread. Our work highlights how long-read sequencing technologies can reveal further complexity within viral transcriptomes.
Project description:The protein diversity of mammalian cells is determined by arrays of isoforms from genes. Protein mutation is essential in species evolution and cancer development. Accurate Long-read transcriptome sequencing at single-cell level is required to decipher the spectrum of protein expressions in mammalian organisms. In this report, we developed a synthetic long-read single-cell sequencing technology based on LOOPseq technique. We applied this technology to analyze 447 transcriptomes of hepatocellular carcinoma (HCC) and benign liver from an individual. Through Uniform Manifold Approximation and Projection (UMAP) analysis, we identified a panel of mutation mRNA isoforms highly specific to HCC cells. The evolution pathways that led to the hyper-mutation clusters in single human leukocyte antigen (HLA) molecules were identified. Novel fusion transcripts were detected. The combination of gene expressions, fusion gene transcripts, and mutation gene expressions significantly improved the classification of liver cancer cells versus benign hepatocytes. In conclusion, LOOPseq single-cell technology may hold promise to provide a new level of precision analysis on the mammalian transcriptome.
Project description:Advantages of RNA-Seq over array based platforms are quantitative gene expression and discovery of expressed single nucleotide variants (eSNVs) and fusion transcripts from a single platform, but the sensitivity for each of these characteristics is unknown. We measured gene expression in a set of manually degraded RNAs, nine pairs of matched fresh-frozen, and FFPE RNA isolated from breast tumor with the hybridization based, NanoString nCounter, (226 gene panel) and with whole transcriptome RNA-Seq using RiboZeroGold ScriptSeq V2 library preparation kits. We performed correlation analyses of gene expression between samples and across platforms. We then specifically assessed whole transcriptome expression of lincRNA and discovery of eSNVs and fusion transcripts in the FFPE RNA-Seq data. For gene expression in the manually degraded samples, we observed Pearson correlation of >0.94 and >0.80 with NanoString and ScriptSeq protocols respectively. Gene expression data for matched fresh-frozen and FFPE samples yielded mean Pearson correlations of 0.874 and 0.783 for NanoString (226 genes) and ScriptSeq whole transcriptome protocols respectively. Specifically for lincRNAs, we observed superb Pearson correlation (0.988) between matched fresh-frozen and FFPE pairs. FFPE samples across NanoString and RNA-Seq platforms gave a mean Pearson correlation of 0.838. In FFPE libraries, we detected 53.4% of high confidence SNVs and 24% of high confidence fusion transcripts. Sensitivity of fusion transcript detection was not overcome by an increase in depth of sequencing up to 3-fold (increase from ~56 to ~159 million reads). Both NanoString and ScriptSeq RNA-Seq technologies yield reliable gene expression data for degraded and FFPE material. The high degree of correlation between NanoString and RNA-Seq platforms suggests discovery based whole transciptome studies from FFPE material will produce reliable expression data. The RiboZeroGold ScriptSeq protocol performed particularly well for lincRNA expression from FFPE libraries but detection of eSNV and fusion transcripts was less sensitive.
Project description:Advantages of RNA-Seq over array based platforms are quantitative gene expression and discovery of expressed single nucleotide variants (eSNVs) and fusion transcripts from a single platform, but the sensitivity for each of these characteristics is unknown. We measured gene expression in a set of manually degraded RNAs, nine pairs of matched fresh-frozen, and FFPE RNA isolated from breast tumor with the hybridization based, NanoString nCounter, (226 gene panel) and with whole transcriptome RNA-Seq using RiboZeroGold ScriptSeq V2 library preparation kits. We performed correlation analyses of gene expression between samples and across platforms. We then specifically assessed whole transcriptome expression of lincRNA and discovery of eSNVs and fusion transcripts in the FFPE RNA-Seq data. For gene expression in the manually degraded samples, we observed Pearson correlation of >0.94 and >0.80 with NanoString and ScriptSeq protocols respectively. Gene expression data for matched fresh-frozen and FFPE samples yielded mean Pearson correlations of 0.874 and 0.783 for NanoString (226 genes) and ScriptSeq whole transcriptome protocols respectively. Specifically for lincRNAs, we observed superb Pearson correlation (0.988) between matched fresh-frozen and FFPE pairs. FFPE samples across NanoString and RNA-Seq platforms gave a mean Pearson correlation of 0.838. In FFPE libraries, we detected 53.4% of high confidence SNVs and 24% of high confidence fusion transcripts. Sensitivity of fusion transcript detection was not overcome by an increase in depth of sequencing up to 3-fold (increase from ~56 to ~159 million reads). Both NanoString and ScriptSeq RNA-Seq technologies yield reliable gene expression data for degraded and FFPE material. The high degree of correlation between NanoString and RNA-Seq platforms suggests discovery based whole transciptome studies from FFPE material will produce reliable expression data. The RiboZeroGold ScriptSeq protocol performed particularly well for lincRNA expression from FFPE libraries but detection of eSNV and fusion transcripts was less sensitive. We performed RNASeq on RNA from nine matched pairs of fresh-frozen and FFPE tissues from breast cancer patients. The goal was to test the RiboZeroGold ScriptSeq complete low input library preparation kit for degraded RNA samples.
Project description:Alternative splicing contributes to transcriptomic complexity and plays a role in the regulation of cellular identity and function, but the correct assembly of transcripts of complex loci as well as their quantification based on short-read sequencing is non-trivial. Recent long-read sequencing methods such as those from ONT and PacBio overcome these problems by potentially sequencing full transcripts. The activation of brown adipose tissue e.g., by reduced ambient temperature (cold) exposure, positively affects metabolism by increasing energy expenditure and releasing endocrine factors and has been shown to involve specific alternative splicing events. Here we assessed important features of ONT long read sequencing protocols in relation to Illumina short read sequencing: (i) Alignment characteristics to the reference genome and transcriptome, (ii) Gene and transcript detection and quantification, (iii) Detection of differential gene and transcript expression events, (iv) Transcriptome reannotation and (v) Detection of differential transcript usage events. We find that ONT long-read sequencing is advantageous in terms of transcriptome reassembly, especially when the reads are enriched for full length reads. Illumina sequencing, due to the higher number of counts available, has a higher statistical power for calling differentiall expressed/used features, whereas long-read sequencing has a lower risk of calling false positive events due to the better ability to unambiguously map reads to transcripts. Finally we describe novel transcript isoforms in cold-activated murine iBAT reassembled from ONT long reads.
Project description:Alternative splicing contributes to transcriptomic complexity and plays a role in the regulation of cellular identity and function, but the correct assembly of transcripts of complex loci as well as their quantification based on short-read sequencing is non-trivial. Recent long-read sequencing methods such as those from ONT and PacBio overcome these problems by potentially sequencing full transcripts. The activation of brown adipose tissue e.g., by reduced ambient temperature (cold) exposure, positively affects metabolism by increasing energy expenditure and releasing endocrine factors and has been shown to involve specific alternative splicing events. Here we assessed important features of ONT long read sequencing protocols in relation to Illumina short read sequencing: (i) Alignment characteristics to the reference genome and transcriptome, (ii) Gene and transcript detection and quantification, (iii) Detection of differential gene and transcript expression events, (iv) Transcriptome reannotation and (v) Detection of differential transcript usage events. We find that ONT long-read sequencing is advantageous in terms of transcriptome reassembly, especially when the reads are enriched for full length reads. Illumina sequencing, due to the higher number of counts available, has a higher statistical power for calling differentiall expressed/used features, whereas long-read sequencing has a lower risk of calling false positive events due to the better ability to unambiguously map reads to transcripts. Finally we describe novel transcript isoforms in cold-activated murine iBAT reassembled from ONT long reads.
Project description:Alternative splicing contributes to transcriptomic complexity and plays a role in the regulation of cellular identity and function, but the correct assembly of transcripts of complex loci as well as their quantification based on short-read sequencing is non-trivial. Recent long-read sequencing methods such as those from ONT and PacBio overcome these problems by potentially sequencing full transcripts. The activation of brown adipose tissue e.g., by reduced ambient temperature (cold) exposure, positively affects metabolism by increasing energy expenditure and releasing endocrine factors and has been shown to involve specific alternative splicing events. Here we assessed important features of ONT long read sequencing protocols in relation to Illumina short read sequencing: (i) Alignment characteristics to the reference genome and transcriptome, (ii) Gene and transcript detection and quantification, (iii) Detection of differential gene and transcript expression events, (iv) Transcriptome reannotation and (v) Detection of differential transcript usage events. We find that ONT long-read sequencing is advantageous in terms of transcriptome reassembly, especially when the reads are enriched for full length reads. Illumina sequencing, due to the higher number of counts available, has a higher statistical power for calling differentiall expressed/used features, whereas long-read sequencing has a lower risk of calling false positive events due to the better ability to unambiguously map reads to transcripts. Finally we describe novel transcript isoforms in cold-activated murine iBAT reassembled from ONT long reads.
Project description:Alternative splicing contributes to transcriptomic complexity and plays a role in the regulation of cellular identity and function, but the correct assembly of transcripts of complex loci as well as their quantification based on short-read sequencing is non-trivial. Recent long-read sequencing methods such as those from ONT and PacBio overcome these problems by potentially sequencing full transcripts. The activation of brown adipose tissue e.g., by reduced ambient temperature (cold) exposure, positively affects metabolism by increasing energy expenditure and releasing endocrine factors and has been shown to involve specific alternative splicing events. Here we assessed important features of ONT long read sequencing protocols in relation to Illumina short read sequencing: (i) Alignment characteristics to the reference genome and transcriptome, (ii) Gene and transcript detection and quantification, (iii) Detection of differential gene and transcript expression events, (iv) Transcriptome reannotation and (v) Detection of differential transcript usage events. We find that ONT long-read sequencing is advantageous in terms of transcriptome reassembly, especially when the reads are enriched for full length reads. Illumina sequencing, due to the higher number of counts available, has a higher statistical power for calling differentiall expressed/used features, whereas long-read sequencing has a lower risk of calling false positive events due to the better ability to unambiguously map reads to transcripts. Finally we describe novel transcript isoforms in cold-activated murine iBAT reassembled from ONT long reads.