Project description:We present MultiEditR, the first algorithm specifically designed to detect and quantify RNA editing from Sanger sequencing (z.umn.edu/multieditr). Although RNA editing is routinely evaluated by measuring the heights of peaks from in Sanger sequencing traces, the accuracy and the precision of this approach has yet to be evaluated against gold-standards next-generation sequencing methods. Through a comprehensive comparison to RNA-seq and amplicon based deep sequencing, we show that MultiEditR is accurate, precise, and reliable for detecting endogenous and programmable RNA editing.
Project description:Total RNA was extracted from zebrafish embryos from the SAT (Sanger AB Tubingen) strain. The RNA was DNase treated. The 3' ends of fragmented RNA was pulled down using polyT oligos attached to magnetic beads, reverse transcribed, made into Illumina libraries and sequenced using IlluminaHiSeq paired-end sequencing. Protocol: Total RNA was extracted and DNase treated. Fragmented RNA was enriched for the 3 ends by pull down using a polyT oligo attached to magnetic beads. An RNA oligo comprising part of the Illumina adapter 2 was ligated to the 5 end of the captured RNA and the RNA was eluted from the beads. Reverse transcription was primed with an anchored polyT oligo with part of Illumina adapter 1 at the 5 end followed by 12 random bases, then an 8 base indexing tag, then CG and 14 T bases. An Illumina library with full adapter sequence was produced by PCR. This data is part of a pre-publication release. For information on the proper use of pre-publication data shared by the Wellcome Trust Sanger Institute (including details of any publication moratoria), please see http://www.sanger.ac.uk/datasharing/
Project description:Total RNA was extracted from zebrafish embryos from the SAT (Sanger AB Tbingen) strain. The RNA was DNase treated. Stranded RNAseq libraries were constructed using the Illumina TruSeq Stranded RNA protocol after treatmant with Ribozero.This data is part of a pre-publication release. For information on the proper use of pre-publication data shared by the Wellcome Trust Sanger Institute (including details of any publication moratoria), please see http://www.sanger.ac.uk/datasharing/
Project description:IGHV mutation status is a well-established prognostic factor in chronic lymphocytic leukemia, and also provides crucial insights into tumor cell biology and function. Currently, determination of IGHV transcript sequence, from which mutation status is calculated, requires a specialized laboratory procedure. RNA sequencing is a method that provides high resolution, high dynamic range transcriptome data that can be used for differential expression, isoform discovery, and variant determination. In this paper, we demonstrate that unselected next-generation RNA sequencing can accurately determine the IGH@ sequence, including the complete sequence of the complementarity-determining region 3 (CDR3), and mutation status of CLL cells, potentially replacing the current method which is a specialized, single-purpose Sanger-sequencing based test.
Project description:Usually, unmapped reads have been considered as useless and been trashed or ignored. Here, we develop a strategy to mining the full length sequence by unmapped reads combining with specific reverse transcription primers design and high throughput sequencing. In this study, we salvage 36 unmapped reads from standard RNA-Seq data(GSM3188619) and randomly select one 149 bp read as a model(CTGGTGCCATAATTCAGGGAACTGTGTTCTTGATGTACTATCTGAGACATTTGTGCTTCCCCCCATCCAGCTATCAGGCTGTTAGGCAATGCACTTCTAGGAATTAGAATTCTATAAGGAATCTCATGCTGGAAGAACAAAAAGACCCA ). Specific reverse transcription primers(5' end:CTGGTGCCATAATTCAGGGA, 3' end:GGATCTTCACGTAACGGATTGT) are designed to amplify its both ends, followed by next generation sequencing. Then we use a statistical model base on power law distribution to estimate its integrality and significance. Further, we validate it by Sanger sequencing. The result shows that the full length is 1,556 bp, with InDel mutation in microsatellite structure. This would be a useful strategy to extract the sequences information from the unmapped RNA-seq data.
Project description:Sequencing-by-synthesis technologies can reduce the cost of generating de novo genome assemblies. We report a method for assembling draft genome sequences of eukaryotic organisms that integrates sequence information from different sources, and demonstrate its effectiveness by assembling an approximately 32.5 Mb draft genome sequence for the forest pathogen Grosmannia clavigera, an ascomycete fungus. We also developed a method for assessing draft assemblies using Illumina paired end read data and demonstrate how we are using it to guide future sequence finishing. Our results demonstrate that eukaryotic genome sequences can be accurately assembled by combining Illumina, 454 and Sanger sequence data.