Project description:Here we describe CapTrap-Seq, an experimental workflow designed to address the problem of reduced transcript end detection by long-read RNA sequencing methods, especially at the 5' ends. We apply CapTrap-Seq to profile transcriptomes of the human heart and brain and we compared the obtained results with other library preparation approaches. CapTrap-Seq is a platform-agnostic method and here tested the method by using 3 different long-read sequencing platforms: MinION (ONT), Sequel (PacBaio) and Sequel II (PacBio).
Project description:The LRGASP challenge encompasses different human, mouse, and manatee samples sequenced using multiple combinations of protocols and platforms. Different challenges will use distinct subsets of the samples for evaluation. The long-read sequencing platforms used in these challenges are the Pacific Biosciences (PacBio) Sequel II, Oxford Nanopore (ONT) MinION and PromethION. Samples will also be sequenced on the Illumina HiSeq 2500. The primary LRGASP library prep protocols are “standard” cDNA sequencing, direct RNA sequencing, R2C2, and CapTrap. Each sample will also include Lexogen SIRV-Set 4 spike-ins. We will also provide simulated PacBio and ONT data as part of the evaluations. This particular study focuses on single strand CAGE sequencing of human iPSCs, defining CAGE peaks from Illumina HiSeq 2500 (SR: 150 cycles) of two biological replicates for use in the LRGASP challenge.
Project description:Alternative splicing contributes to transcriptomic complexity and plays a role in the regulation of cellular identity and function, but the correct assembly of transcripts of complex loci as well as their quantification based on short-read sequencing is non-trivial. Recent long-read sequencing methods such as those from ONT and PacBio overcome these problems by potentially sequencing full transcripts. The activation of brown adipose tissue e.g., by reduced ambient temperature (cold) exposure, positively affects metabolism by increasing energy expenditure and releasing endocrine factors and has been shown to involve specific alternative splicing events. Here we assessed important features of ONT long read sequencing protocols in relation to Illumina short read sequencing: (i) Alignment characteristics to the reference genome and transcriptome, (ii) Gene and transcript detection and quantification, (iii) Detection of differential gene and transcript expression events, (iv) Transcriptome reannotation and (v) Detection of differential transcript usage events. We find that ONT long-read sequencing is advantageous in terms of transcriptome reassembly, especially when the reads are enriched for full length reads. Illumina sequencing, due to the higher number of counts available, has a higher statistical power for calling differentiall expressed/used features, whereas long-read sequencing has a lower risk of calling false positive events due to the better ability to unambiguously map reads to transcripts. Finally we describe novel transcript isoforms in cold-activated murine iBAT reassembled from ONT long reads.
Project description:Alternative splicing contributes to transcriptomic complexity and plays a role in the regulation of cellular identity and function, but the correct assembly of transcripts of complex loci as well as their quantification based on short-read sequencing is non-trivial. Recent long-read sequencing methods such as those from ONT and PacBio overcome these problems by potentially sequencing full transcripts. The activation of brown adipose tissue e.g., by reduced ambient temperature (cold) exposure, positively affects metabolism by increasing energy expenditure and releasing endocrine factors and has been shown to involve specific alternative splicing events. Here we assessed important features of ONT long read sequencing protocols in relation to Illumina short read sequencing: (i) Alignment characteristics to the reference genome and transcriptome, (ii) Gene and transcript detection and quantification, (iii) Detection of differential gene and transcript expression events, (iv) Transcriptome reannotation and (v) Detection of differential transcript usage events. We find that ONT long-read sequencing is advantageous in terms of transcriptome reassembly, especially when the reads are enriched for full length reads. Illumina sequencing, due to the higher number of counts available, has a higher statistical power for calling differentiall expressed/used features, whereas long-read sequencing has a lower risk of calling false positive events due to the better ability to unambiguously map reads to transcripts. Finally we describe novel transcript isoforms in cold-activated murine iBAT reassembled from ONT long reads.
Project description:Alternative splicing contributes to transcriptomic complexity and plays a role in the regulation of cellular identity and function, but the correct assembly of transcripts of complex loci as well as their quantification based on short-read sequencing is non-trivial. Recent long-read sequencing methods such as those from ONT and PacBio overcome these problems by potentially sequencing full transcripts. The activation of brown adipose tissue e.g., by reduced ambient temperature (cold) exposure, positively affects metabolism by increasing energy expenditure and releasing endocrine factors and has been shown to involve specific alternative splicing events. Here we assessed important features of ONT long read sequencing protocols in relation to Illumina short read sequencing: (i) Alignment characteristics to the reference genome and transcriptome, (ii) Gene and transcript detection and quantification, (iii) Detection of differential gene and transcript expression events, (iv) Transcriptome reannotation and (v) Detection of differential transcript usage events. We find that ONT long-read sequencing is advantageous in terms of transcriptome reassembly, especially when the reads are enriched for full length reads. Illumina sequencing, due to the higher number of counts available, has a higher statistical power for calling differentiall expressed/used features, whereas long-read sequencing has a lower risk of calling false positive events due to the better ability to unambiguously map reads to transcripts. Finally we describe novel transcript isoforms in cold-activated murine iBAT reassembled from ONT long reads.
Project description:Alternative splicing contributes to transcriptomic complexity and plays a role in the regulation of cellular identity and function, but the correct assembly of transcripts of complex loci as well as their quantification based on short-read sequencing is non-trivial. Recent long-read sequencing methods such as those from ONT and PacBio overcome these problems by potentially sequencing full transcripts. The activation of brown adipose tissue e.g., by reduced ambient temperature (cold) exposure, positively affects metabolism by increasing energy expenditure and releasing endocrine factors and has been shown to involve specific alternative splicing events. Here we assessed important features of ONT long read sequencing protocols in relation to Illumina short read sequencing: (i) Alignment characteristics to the reference genome and transcriptome, (ii) Gene and transcript detection and quantification, (iii) Detection of differential gene and transcript expression events, (iv) Transcriptome reannotation and (v) Detection of differential transcript usage events. We find that ONT long-read sequencing is advantageous in terms of transcriptome reassembly, especially when the reads are enriched for full length reads. Illumina sequencing, due to the higher number of counts available, has a higher statistical power for calling differentiall expressed/used features, whereas long-read sequencing has a lower risk of calling false positive events due to the better ability to unambiguously map reads to transcripts. Finally we describe novel transcript isoforms in cold-activated murine iBAT reassembled from ONT long reads.
Project description:Alternative splicing contributes to transcriptomic complexity and plays a role in the regulation of cellular identity and function, but the correct assembly of transcripts of complex loci as well as their quantification based on short-read sequencing is non-trivial. Recent long-read sequencing methods such as those from ONT and PacBio overcome these problems by potentially sequencing full transcripts. The activation of brown adipose tissue e.g., by reduced ambient temperature (cold) exposure, positively affects metabolism by increasing energy expenditure and releasing endocrine factors and has been shown to involve specific alternative splicing events. Here we assessed important features of ONT long read sequencing protocols in relation to Illumina short read sequencing: (i) Alignment characteristics to the reference genome and transcriptome, (ii) Gene and transcript detection and quantification, (iii) Detection of differential gene and transcript expression events, (iv) Transcriptome reannotation and (v) Detection of differential transcript usage events. We find that ONT long-read sequencing is advantageous in terms of transcriptome reassembly, especially when the reads are enriched for full length reads. Illumina sequencing, due to the higher number of counts available, has a higher statistical power for calling differentiall expressed/used features, whereas long-read sequencing has a lower risk of calling false positive events due to the better ability to unambiguously map reads to transcripts. Finally we describe novel transcript isoforms in cold-activated murine iBAT reassembled from ONT long reads.
Project description:Alternative splicing is widely acknowledged to be a crucial regulator of gene expression and is a key contributor to both normal developmental processes and disease states. While cost-effective and accurate for quantification, short-read RNA-seq lacks the ability to resolve full-length transcript isoforms despite increasingly sophisticated computational methods. Long-read sequencing platforms such as Pacific Biosciences (PacBio) and Oxford Nanopore (ONT) bypass the transcript reconstruction challenges of short-reads. Here we describe TALON, the ENCODE4 pipeline for analyzing PacBio cDNA and ONT direct-RNA transcriptomes. We apply TALON to three human ENCODE Tier 1 cell lines and show that while both technologies perform well at full-transcript discovery and quantification, each one displayed distinct artifacts. We further apply TALON to mouse cortical and hippocampal transcriptomes and find that a substantial proportion of neuronal genes have more reads associated with novel isoforms than with annotated ones. These data show that TALON is a technology-agnostic long-read transcriptome discovery and quantification pipeline capable of tracking both known and novel transcript models, as well as their expression levels, across datasets for both simple studies and in larger projects. These properties will enable TALON users to move beyond the limitations of short-read data to perform isoform discovery and quantification in a uniform manner on existing and future long-read platforms.
Project description:Alternative splicing is widely acknowledged to be a crucial regulator of gene expression and is a key contributor to both normal developmental processes and disease states. While cost-effective and accurate for quantification, short-read RNA-seq lacks the ability to resolve full-length transcript isoforms despite increasingly sophisticated computational methods. Long-read sequencing platforms such as Pacific Biosciences (PacBio) and Oxford Nanopore (ONT) bypass the transcript reconstruction challenges of short-reads. Here we describe TALON, the ENCODE4 pipeline for analyzing PacBio cDNA and ONT direct-RNA transcriptomes. We apply TALON to three human ENCODE Tier 1 cell lines and show that while both technologies perform well at full-transcript discovery and quantification, each technology has its distinct artifacts. We further apply TALON to mouse cortical and hippocampal transcriptomes and find that a substantial proportion of neuronal genes have more reads associated with novel isoforms than annotated ones. The TALON pipeline for technology-agnostic, long-read transcriptome discovery and quantification tracks both known and novel transcript models as well as expression levels across datasets for both simple studies and larger projects such as ENCODE that seek to decode transcriptional regulation in the human and mouse genomes to predict more accurate expression levels of genes and transcripts than possible with short-reads alone.
Project description:Rapidly increased studies by third-generation sequencing [Pacific Biosciences (Pacbio) and Oxford Nanopore Technologies (ONT)] have been used in all kinds of research areas. Among them, the plant full-length single-molecule transcriptome studies were most used by Pacbio while ONT was rarely used. Therefore, in this study, we developed ONT RNA-sequencing methods in plants. We performed a detailed evaluation of reads from Pacbio and Nanopore PCR cDNA (ONT Pc) sequencing in plants (Arabidopsis), including the characteristics of raw data and identification of transcripts. We aimed to provide a valuable reference for applications of ONT in plant transcriptome analysis.