Project description:Accurate annotation of transcript isoforms is crucial to understand gene functions, but automated methods for reconstructing full-length transcripts from RNA sequencing (RNA-seq) data remain imprecise. We developed Bookend, a software package for transcript assembly that incorporates data from different RNA-seq techniques, with a focus on identifying and utilizing RNA 5′ and 3′ ends. Through end-guided assembly with Bookend we demonstrate that correct modeling of transcript start and end sites is essential for precise transcript assembly. Furthermore, we discovered that utilization of end-labeled reads present in full-length single-cell RNA-seq (scRNA-seq) datasets dramatically improves the precision of transcript assembly in single cells. Finally, we show that hybrid assembly across short-read, long-read, and end-capture RNA-seq datasets from Arabidopsis, as well as meta-assembly of RNA-seq from single mouse embryonic stem cells (mESCs) can produce end-to-end transcript annotations of comparable quality to reference annotations in these model organisms.
Project description:New tools for improved long-read transcript assembly and coalescence with its short-read counterpart are required. Using our short- and long-read measurements from different cell lines with spiked-in standards, we systematically compared key parameters and biases in the read alignment and assembly of transcripts. We report a cDNA synthesis artifact in long-read datasets that impacts the identity and quantitation of assembled transcripts. We developed a computational pipeline to strand long-read cDNA libraries that markedly improves assembly of transcripts from long-reads. Incorporating stranded long-reads in a new hybrid assembly approach, we demonstrate its efficacy for improved characterization of challenging lncRNA transcripts. Our workflow can be applied to a wide range of transcriptomics datasets for superior demarcation of transcript ends and refined isoform structure, which can enable better differential gene expression analyses and molecular manipulations of transcripts.
Project description:New tools for improved long-read transcript assembly and coalescence with its short-read counterpart are required. Using our short- and long-read measurements from different cell lines with spiked-in standards, we systematically compared key parameters and biases in the read alignment and assembly of transcripts. We report a cDNA synthesis artifact in long-read datasets that impacts the identity and quantitation of assembled transcripts. We developed a computational pipeline to strand long-read cDNA libraries that markedly improves assembly of transcripts from long-reads. Incorporating stranded long-reads in a new hybrid assembly approach, we demonstrate its efficacy for improved characterization of challenging lncRNA transcripts. Our workflow can be applied to a wide range of transcriptomics datasets for superior demarcation of transcript ends and refined isoform structure, which can enable better differential gene expression analyses and molecular manipulations of transcripts.
Project description:Short-read RNA sequencing (RNAseq) remains a cornerstone for transcriptome profiling, but is limited in reconstructing full-length transcripts and capturing transcript diversity. While long-read RNAseq spans entire transcripts and resolves complex structures, this technology is hindered by its high error rates. In parallel, noncoding RNA transcripts remain underrepresented in current references. Here, we present HyDRA (Hybrid de novo RNA Assembly), a pipeline that integrates the accuracy of short reads with the structural resolution of long reads to produce more complete de novo transcriptome assemblies. Benchmarking showed HyDRA to outperform existing methods by up to 40%. Using the HyDRA human ovarian metatranscriptome, we identified >50,000 high-confidence long noncoding RNAs, most of which have not been previously detected using traditional methods. Although long-read RNAseq is advancing, the vast availability of short reads ensures HyDRA’s ongoing role in capturing high-confidence, cell-type specific transcripts and advancing our understanding of transcriptomic complexity and the noncoding genome.
Project description:Evaluation of short-read-only, long-read-only, and hybrid assembly approaches on metagenomic samples demonstrating how they affect gene and protein prediction which is relevant for downstream functional analyses. For a human gut microbiome sample, we use complementary metatranscriptomic, and metaproteomic data to evaluate the metagenomic-based protein predictions.
Project description:Self-assembly is a fundamental property of living matter that drives the three-dimensional (3D) organization of cell collectives such as tissues and organs. Here, the co-assembly of synthetic and natural cells is leveraged to create hybrid living 3D cancer cultures. We demonstrate that synthetic cells based on droplet-supported lipid bilayers can establish artificial tumor immune microenvironments (ART-TIMEs), mimicking immunogenic signals within tumoroids formed by the cell line PANC-1 and eliminating the need to integrate complex living immune cells. Using the ART-TIME approach, we identify a co-signaling mechanism between PD-1 and other T cell-derived surface receptors as a driver in immune evasion of pancreatic ductal adenocarcinoma.
Project description:Innate immune responses triggered by Drosophila larval hemocytes have been extensively characterized. However, the full extent of transcriptional and post-transcriptional regulation underlying these processes remains poorly understood. Here, we employed a hybrid sequencing strategy integrating Oxford Nanopore long-read and Illumina short-read sequencing to provide a more comprehensive transcriptome annotation. This enabled discovery of full-length transcripts with novel 5′ and 3′ boundaries and uncovered 349 previously unannotated long non-coding RNAs highly induced during late stages of wasp infestation. To ensure high confidence transcript models, we further eliminated potential intra-priming artifacts specific to long-read cDNA data. This high-confident full-length transcript models helped to reveal cell type–specific lncRNA markers in lamellocytes and crystal cells by single-cell analyses, which recapitulated hemocyte differentiation trajectories. Notably, RNAi-based depletion of two highly induced lncRNAs impaired lamellocyte formation under wasp infestation, highlighting their functional relevance. Collectively, our findings provide detailed insights into the Drosophila larval immune transcriptomes through the long-read sequencing and highlight the regulatory roles of non-coding RNAs in innate immunity.
Project description:<p class='ql-align-justify'>Megasphaera hexanoica KCCM 43214T, isolated from cow rumen, is capable of producing medium-chain carboxylic acids such as n-caproate and n-caprylate. In this study, we present a high-quality genome assembly, along with intracellular metabolomic profiling and pangenomic analysis. Illumina sequencing generated 2.3 Mbp from 15,293,634 reads with a GC content of 49.5%, while PacBio HiFi sequencing produced 331.5 Mbp across 45,266 reads, with an average read length of 7,323 bp and a HiFi read N50 of 8,214 bp. Hybrid assembly of short and long reads resulted in a single 2.88 Mbp contig, containing 2,075-2,083 unique genes. A genome-scale metabolic model was constructed, to evaluate its metabolic capabilities under specific growth conditions. Intracellular metabolomic analysis of cells grown in fructose medium and lactate medium revealed key metabolic activities associated with chain elongation. Pangenomic analysis across nine annotated genomes identified 6,721 orthologous gene using OrthoMCL, emphasizing the genetic and functional diversity within the Megasphaera genus. This dataset offers valuable insights into the metabolism and biotechnological potential of M. hexanoica KCCM 43214T.</p>
Project description:The aim of this project is to promote the breath volatile marker concept for colorectal cancer (CRC) screening by advancing developing the application of a novel hybrid analyzer for the purpose.
The hybrid analyzer concept is expected to benefit of combining metal-oxide (MOX) and infrared spectrum (IR) sensor acquired data. The current study will be the first globally to address this concept in CRC detection. In addition, traditional methods, in particular, gas chromatography coupled to mass spectrometry (GC-MS) will be used to address the biological relevance of the VOCs emission from cancer tissue and will assist in further advances of the hybrid-sensing approach.