Project description:Alternative splicing is widely acknowledged to be a crucial regulator of gene expression and is a key contributor to both normal developmental processes and disease states. While cost-effective and accurate for quantification, short-read RNA-seq lacks the ability to resolve full-length transcript isoforms despite increasingly sophisticated computational methods. Long-read sequencing platforms such as Pacific Biosciences (PacBio) and Oxford Nanopore (ONT) bypass the transcript reconstruction challenges of short-reads. Here we describe TALON, the ENCODE4 pipeline for analyzing PacBio cDNA and ONT direct-RNA transcriptomes. We apply TALON to three human ENCODE Tier 1 cell lines and show that while both technologies perform well at full-transcript discovery and quantification, each one displayed distinct artifacts. We further apply TALON to mouse cortical and hippocampal transcriptomes and find that a substantial proportion of neuronal genes have more reads associated with novel isoforms than with annotated ones. These data show that TALON is a technology-agnostic long-read transcriptome discovery and quantification pipeline capable of tracking both known and novel transcript models, as well as their expression levels, across datasets for both simple studies and in larger projects. These properties will enable TALON users to move beyond the limitations of short-read data to perform isoform discovery and quantification in a uniform manner on existing and future long-read platforms.
Project description:Alternative splicing is widely acknowledged to be a crucial regulator of gene expression and is a key contributor to both normal developmental processes and disease states. While cost-effective and accurate for quantification, short-read RNA-seq lacks the ability to resolve full-length transcript isoforms despite increasingly sophisticated computational methods. Long-read sequencing platforms such as Pacific Biosciences (PacBio) and Oxford Nanopore (ONT) bypass the transcript reconstruction challenges of short-reads. Here we describe TALON, the ENCODE4 pipeline for analyzing PacBio cDNA and ONT direct-RNA transcriptomes. We apply TALON to three human ENCODE Tier 1 cell lines and show that while both technologies perform well at full-transcript discovery and quantification, each technology has its distinct artifacts. We further apply TALON to mouse cortical and hippocampal transcriptomes and find that a substantial proportion of neuronal genes have more reads associated with novel isoforms than annotated ones. The TALON pipeline for technology-agnostic, long-read transcriptome discovery and quantification tracks both known and novel transcript models as well as expression levels across datasets for both simple studies and larger projects such as ENCODE that seek to decode transcriptional regulation in the human and mouse genomes to predict more accurate expression levels of genes and transcripts than possible with short-reads alone.
Project description:A new genome of Fraxinus excelsior was assembled using a hybrid approach combining Nanopore and Illumina data (BioProject PRJNA865134, SAMN30100368, genome JANJPF000000000 ). Methylation was also assessed in the genome. Manuscript title: Fraxinus excelsior updated long-read genome reveals the importance of MADS-box genes in tolerance mechanisms against ash dieback, G3:Genes|Genomes|Genetics
Project description:Transposon insertion site sequencing (TIS) is a powerful method for associating genotype to phenotype. However, all TIS methods described to date use short nucleotide sequence reads which cannot uniquely determine the locations of transposon insertions within repeating genomic sequences where the repeat units are longer than the sequence read length. To overcome this limitation, we have developed a TIS method using Oxford Nanopore sequencing technology that generates and uses long nucleotide sequence reads; we have called this method LoRTIS (Long Read Transposon Insertion-site Sequencing). This experiment data contains sequence files generated using Nanopore and Illumina platforms. Biotin1308.fastq.gz and Biotin2508.fastq.gz are fastq files generated from nanopore technology. Rep1-Tn.fastq.gz and Rep1-Tn.fastq.gz are fastq files generated using Illumina platform. In this study, we have compared the efficiency of two methods in identification of transposon insertion sites.
Project description:Genome-wide 5-methylcytosine (5mC) profiling at CpG dinucleotides in Hydra viridissima using Oxford Nanopore long-read sequencing with Dorado base modification detection. Five ONT runs (one symbiotic, four aposymbiotic clone 2) were basecalled with Dorado sup,5mCG_5hmCG, aligned to Carnegie v1 genome assembly (JBWVZK000000000), and methylation quantified with modkit. Global CpG methylation is ~9-10%, bimodal (88% unmethylated, 7% fully methylated). Unique genomic regions show higher methylation (12%) than repetitive regions (7.5%).
Project description:This study benchmarks bulk and single-cell long-read RNA sequencing technologies in a human neuronal model of Fragile X syndrome. NGN2-induced neurons were generated from patient-derived iPSCs carrying a silenced FMR1 gene (FXS line E3) and an isogenic CRISPR-corrected rescue line (IsoB11) in which FMR1 expression is restored. These conditions provide a defined system to evaluate transcript detection and quantification across sequencing platforms. Bulk and single-cell RNA-seq datasets were generated using Illumina short-read sequencing and long-read sequencing from Pacific Biosciences (PB) and Oxford Nanopore Technologies (ONT). Single-cell libraries were prepared using the 10x Genomics Chromium platform. ERCC and SIRV spike-in controls were added to bulk samples to enable benchmarking of transcript quantification accuracy. Three biological replicates were sequenced for each condition. The dataset enables cross-platform comparisons of transcript detection, quantification methods, transcript length biases, and sequencing depth requirements for long-read transcriptomic analyses.
Project description:This study benchmarks bulk and single-cell long-read RNA sequencing technologies in a human neuronal model of Fragile X syndrome. NGN2-induced neurons were generated from patient-derived iPSCs carrying a silenced FMR1 gene (FXS line E3) and an isogenic CRISPR-corrected rescue line (IsoB11) in which FMR1 expression is restored. These conditions provide a defined system to evaluate transcript detection and quantification across sequencing platforms. Bulk and single-cell RNA-seq datasets were generated using Illumina short-read sequencing and long-read sequencing from Pacific Biosciences (PB) and Oxford Nanopore Technologies (ONT). Single-cell libraries were prepared using the 10x Genomics Chromium platform. ERCC and SIRV spike-in controls were added to bulk samples to enable benchmarking of transcript quantification accuracy. Three biological replicates were sequenced for each condition. The dataset enables cross-platform comparisons of transcript detection, quantification methods, transcript length biases, and sequencing depth requirements for long-read transcriptomic analyses.
Project description:In this work, we collected and analyzed two cohorts of young-adult and aged-adult mice brain mRNAs and determined their levels using second- (illumina) and third-generation (Oxford Nanopore) sequencing technologies. We report a transcriptome-wide study of differential transcript usage during brain aging. In addition, we provide the community with a large resource of whole brain transcriptomes and comprehensive analyses that identify widespread diversity of mRNAs during aging.