Project description:RACE (Rapid Amplification of cDNA Ends) is a widely used approach for transcript identification. However, the dynamic range in the population of RACE transcript isoforms may be very large, and random clone selection -the typical approach- may be ineffective in sampling the different transcript species present in the population. Here, we describe an effective RACE sampling strategy. The products of the RACE reaction are hybridized onto high-density tiling arrays, and the exons detected are then used to delineate a series of RT-PCR reactions, through which the original RACE mixture is segregated into a number of simpler RT-PCR reactions. These are independently cloned, and randomly selected clones are sequenced. This approach is superior to the direct cloning and sequencing of the RACE products: it specifically targets novel transcripts, and often leads to the overall normalization of their abundances. We indeed show theoretically that this strategy leads to a very efficient sampling of the novel transcript species associated to annotated loci. In a pilot experiment, we used this approach to discover many novel transcripts for a few otherwise well-characterized protein coding genes. Finally we investigate how this strategy can be multiplexed for large-scale transcript discovery by high-density pooling of RACE reactions prior to hybridization. Our results indicate that through the interrogation of a limited number of exons per gene on a limited number of cell types, it is possible to recover a large fraction of the transcript diversity associated to protein coding loci. These loci, however, could be occupying a much larger genomic space than previously expected, implying that efficient multiplexing requires non-trivial pooling optimization.
Project description:Long non-coding RNAs (lncRNA) constitute a large fraction of mammalian transcriptomes that still remains unexplored, mainly due to the lack of comprehensive, high-quality lncRNA annotation that limits the possibility to fully explore their functional capacity. We have developed RACE-seq, an experimental workflow based on RACE (Rapid Amplification of cDNA Ends) and long read RNA sequencing, aimed at both rare isoform discovery and better definition of gene boundaries. We applied 3â and 5â RACE-seq on 398 low-expressed GENCODE v7 lncRNA genes in seven human tissues (brain, testis, heart, kidney, liver, lung and spleen). The sequences obtained led to the discovery of 2,641 on-target, previously unknown alternative transcripts. Novel isoforms extended 60% of the 398 targeted lncRNA loci further in either 5' or 3', and often reached genome hallmarks typical of gene boundaries. In parallel, we used nested RACE-seq, and confirmed that nested RACE-seq has overwhelmingly better sensitivity than its standard counterpart.
Project description:Targeted proteomics by selected/multiple reaction monitoring or, on a larger scale, by SWATH MS relies on spectral reference libraries for peptide identification. Quality and coverage of these libraries are therefore of critical importance. Here we present a detailed protocol that has been successfully used to build high-quality, extensive reference libraries supporting targeted proteomics by SWATH MS. We describe each step of the process, including data acquisition by discovery proteomics, assertion of peptide-spectrum matches, generation of consensus spectra and compilation of mass spectrometric coordinates that uniquely define each targeted peptide. Crucial steps of this process such as FDR control, retention time normalization and handling of post-translationally modified peptides are discussed in detail. Finally we show how to use the library to extract SWATH data with the open-source software Skyline. The protocol takes 2-3 days to complete, depending on the extent of the library and the computational resources available.
Project description:This study provides a comprehensive transcript analysis of the developing mouse retina with a focus on normalization, differentially expression, cell-type specific gene expression, transcription factor co-expression, alternative splicing, and novel transcript discovery. RNA-seq profiling was performed on four embryonic and eight postnatal time points during mouse retinal development and maturation. Alignment, transcript quantitation, normalization and differential expression, alternative splice usage, and novel transcript discovery was performed. Quantitative analysis revealed that 25,901 total transcripts encoding for 13,714 genes were expressed in the mouse retina between embryonic day 11 and postnatal day 28. Of these expressed transcripts, 12,075 were significantly differentially expressed (10,069 genes) at some point during development corresponding to ~73% of the expressed genes.
Project description:Normalization of RNA-sequencing data is essential for accurate downstream inference, but the assumptions upon which most methods are based do not hold in the single-cell setting. Consequently, applying existing normalization methods to single-cell RNA-seq data introduces artifacts that bias downstream analyses. To address this, we introduce SCnorm for accurate and efficient normalization of scRNA-seq data.
Project description:Targeted proteomics by selected/multiple reaction monitoring or, on a larger scale, by SWATH MS relies on spectral reference libraries for peptide identification. Quality and coverage of these libraries are therefore of critical importance. Here we present a detailed protocol that has been successfully used to build high-quality, extensive reference libraries supporting targeted proteomics by SWATH MS. We describe each step of the process, including data acquisition by discovery proteomics, assertion of peptide-spectrum matches, generation of consensus spectra and compilation of mass spectrometric coordinates that uniquely define each targeted peptide. Crucial steps of this process such as FDR control, retention time normalization and handling of post-translationally modified peptides are discussed in detail. Finally we show how to use the library to extract SWATH data with the open-source software Skyline. The protocol takes 2-3 days to complete, depending on the extent of the library and the computational resources available.
Project description:We map the transcription start sites of 1085 murine olfactory receptor genes and analyze putative promoters The bar files contain MAT analysis of RLM-RACE products hybridized to a custom olfactory genome tiling array. RNA from the olfactory epithelia of adult mice was prepared by RLM-RACE. Olfactroy receptor transcripts were amplified by degenerate priming and hybridized to tiling arrays to map 5' transcript structure.
Project description:The Affymetrix GeneChip Wheat Genome Array currently provides the most comprehensive coverage of the wheat genome for a microarray. In addition to using this resource for transcript expression studies and hybridization-based DNA marker discovery, we endeavored to use the GeneChip to discover the expression of natural antisense transcript (NAT) pairs. By using alternative target preparation schemes, both the sense- and antisense-strand derived transcripts were labeled and hybridized to the Wheat GeneChip. To enable maximum discovery, five different tissue types were selected for assay, and the wheat cultivar ‘Chinese Spring’ was used considering that most of the GeneChip probe sequences were based on sequencing of this genome. [PLEXdb(http://www.plexdb.org) has submitted this series at GEO on behalf of the original contributor, Tristan Coram. The equivalent experiment is TA21 at PLEXdb.]
Project description:SWATH is a mass spectrometry data acquisition strategy that relies on peptide spectral libraries to perform quantitatively accurate and consistent measurement of proteins across multiple samples. Public libraries have been developed for humans and several laboratory species that have accelerated biomarker discovery, and similarly comprehensive resources would be useful for other species. The Veterinary Proteome Browser, VPBrowse (http://browser.proteo.cloud/), is an on-line platform developed for genome-based representation of the Bos taurus proteome and is equipped with an interactive database and tools for visualization and building quantitative mass spectrometry assays. VPBrowse contains high quality tandem mass spectrometry (MS/MS) spectra acquired on QToF instrument for over 36,000 proteotypic peptides corresponding to over 10,000 bovine proteins. Data can be downloaded in different formats to enable analysis using popular software packages for SWATH data processing whilst normalization to iRT scale ensures compatibility with diverse chromatography systems. When applied to 25 different tissues and body fluids, the resource supported label-free quantification of nearly 30% of the protein-coding genes annotated in bovine section of UniprotKB.