Project description:Alternative splicing is widely acknowledged to be a crucial regulator of gene expression and is a key contributor to both normal developmental processes and disease states. While cost-effective and accurate for quantification, short-read RNA-seq lacks the ability to resolve full-length transcript isoforms despite increasingly sophisticated computational methods. Long-read sequencing platforms such as Pacific Biosciences (PacBio) and Oxford Nanopore (ONT) bypass the transcript reconstruction challenges of short-reads. Here we describe TALON, the ENCODE4 pipeline for analyzing PacBio cDNA and ONT direct-RNA transcriptomes. We apply TALON to three human ENCODE Tier 1 cell lines and show that while both technologies perform well at full-transcript discovery and quantification, each technology has its distinct artifacts. We further apply TALON to mouse cortical and hippocampal transcriptomes and find that a substantial proportion of neuronal genes have more reads associated with novel isoforms than annotated ones. The TALON pipeline for technology-agnostic, long-read transcriptome discovery and quantification tracks both known and novel transcript models as well as expression levels across datasets for both simple studies and larger projects such as ENCODE that seek to decode transcriptional regulation in the human and mouse genomes to predict more accurate expression levels of genes and transcripts than possible with short-reads alone.
Project description:With the recent advancements in genome editing, next generation sequencing (NGS), and scalable cloning techniques, scientists can now conduct genetic screens at unprecedented levels of scale and precision. With such a multitude of technologies, there is a need for a simple yet comprehensive pipeline to enable systematic mammalian genetic screening. In this study, we develop novel algorithms for target identi fication and a toxin-less Gateway cloning tool, termed MegaGate, for library cloning which, when combined with existing genetic perturbation methods and NGS-coupled readouts, enable versatile engineering of relevant mammalian cell lines. Our integrated pipeline for Sequencing-based Target Ascertainment and Modular Perturbation Screening (STAMPScreen) can thus be utilized for a host of cell state engineering applications.
Project description:Alternative splicing is widely acknowledged to be a crucial regulator of gene expression and is a key contributor to both normal developmental processes and disease states. While cost-effective and accurate for quantification, short-read RNA-seq lacks the ability to resolve full-length transcript isoforms despite increasingly sophisticated computational methods. Long-read sequencing platforms such as Pacific Biosciences (PacBio) and Oxford Nanopore (ONT) bypass the transcript reconstruction challenges of short-reads. Here we describe TALON, the ENCODE4 pipeline for analyzing PacBio cDNA and ONT direct-RNA transcriptomes. We apply TALON to three human ENCODE Tier 1 cell lines and show that while both technologies perform well at full-transcript discovery and quantification, each one displayed distinct artifacts. We further apply TALON to mouse cortical and hippocampal transcriptomes and find that a substantial proportion of neuronal genes have more reads associated with novel isoforms than with annotated ones. These data show that TALON is a technology-agnostic long-read transcriptome discovery and quantification pipeline capable of tracking both known and novel transcript models, as well as their expression levels, across datasets for both simple studies and in larger projects. These properties will enable TALON users to move beyond the limitations of short-read data to perform isoform discovery and quantification in a uniform manner on existing and future long-read platforms.
Project description:Background Single-cell RNA-sequencing (scRNA-seq) experiments typically analyze hundreds or thousands of cells after amplification of the cDNA. The high throughput is made possible by the early introduction of sample-specific bar codes (BCs), and the amplification bias is alleviated by unique molecular identifiers (UMIs). Thus, the ideal analysis pipeline for scRNA-seq data needs to efficiently tabulate reads according to both BC and UMI. Findings zUMIs is a pipeline that can handle both known and random BCs and also efficiently collapse UMIs, either just for exon mapping reads or for both exon and intron mapping reads. If BC annotation is missing, zUMIs can accurately detect intact cells from the distribution of sequencing reads. Another unique feature of zUMIs is the adaptive downsampling function that facilitates dealing with hugely varying library sizes but also allows the user to evaluate whether the library has been sequenced to saturation. To illustrate the utility of zUMIs, we analyzed a single-nucleus RNA-seq dataset and show that more than 35% of all reads map to introns. Also, we show that these intronic reads are informative about expression levels, significantly increasing the number of detected genes and improving the cluster resolution. Conclusions zUMIs flexibility makes if possible to accommodate data generated with any of the major scRNA-seq protocols that use BCs and UMIs and is the most feature-rich, fast, and user-friendly pipeline to process such scRNA-seq data.
Project description:Site-specific regulation of protein N-glycosylation is essential in human cells. However, accurate quantification of glycosylation sites and their individual glycan moieties in a cell-wide manner is still technically challenging. Here, we introduce SugarQuant, an integrated mass spectrometry-based pipeline comprising fast protein aggregation capture (PAC)-based sample preparation, optimized multi-notch MS3 LC-MS acquisition (Glyco-SPS-MS3) and a data-processing tool (GlycoBinder) that allows for confident, global identification and quantification of intact glycopeptides in complex biological samples. PAC greatly reduces the overall sample-handling time without compromising sensitivity. Glyco-SPS-MS3 combines high-resolution MS2 and MS3 scans, resulting in enhanced reporter signals of isobaric mass tags, improved detection of N-glycopeptide fragments, and significantly lowered interference in multiplexed quantification. GlycoBinder enables streamlined processing of Glyco-SPS-MS3 data, followed by a two-step database search which increases the identification rates of intact glycopeptides by up to 22% when compared with one-step database search strategies. SugarQuant was applied to identify and quantify more than 5,000 unique glycoforms in Burkitt’s lymphoma cells, and determined complex site-specific glycosylation changes that occurred upon inhibition of fucosylation at high confidence.
Project description:Purpose: Develop an analytical pipeline for dynamic RNA-seq experiments and highlight the importance of considering surgery effect in MI-induced models Methods: Mice have undergone surgery, hearts were harvested and RNA was extracted, sequenced and analyzed by DESeq2 R package. Differentially expressed transcripts were clustered by WGCNA and then analyzed to identify enriched biological processes and transcription factors. Results: Our pipeline enabled the detection of 1027 DETs. Enriched biological processes were mainly cell signilisation and inflammatory related. IL-6 was found as a key controller of the identified TFs and DETs. Immune cells were recruited to the myocardium later post surgery and some possessed phenotypic changes.
Project description:In this work, we evaluated the genetic stabilization process, of the intra- (Saccharomyces cerevisiae) and interspecific (S. cerevisiae x Saccharomyces kudriavzevii) hybrids obtained by different non-GMO techniques, under fermentative conditions. Large-scale transitions in genome size, detected by measuring total DNA content, and genome reorganizations in both nuclear and mitochondrial DNA, evidenced by changes in molecular markers, were observed during the experiments. Interspecific hybrids seem to need fewer generations to reach genetic stability than intraspecific hybrids. The largest number of molecular patterns among the derived stable colonies was observed for intraspecific hybrids, particularly for those obtained by rare-mating in which the total amount of initial DNA was larger. Finally, a representative intraspecific stable hybrid underwent a normal industrial process to obtain active dry yeast production as an important point at which inducing changes in genome composition was possible. No changes in hybrid genetic composition after this procedure were confirmed by comparative genome hybridization. According to our results, fermentation steps 2 and 5 –comprising between 30 and 50 generations- suffice to obtain genetically stable interspecific and intraspecific hybrids, respectively. This work aimed to develop and validate a fast genetic stabilization method for newly generated Saccharomyces hybrids under selective enological conditions. A comparison of the whole stabilization process in intra- and interspecific hybrids showing different ploidy levels, as a result of using different hybridization methodologies, was also made. A stable hybrid strain was compared with itself before and after ADY (active dry yeast) production in order to evaluate the genetic stability of this strain.