Project description:Unique Molecular Identifiers (UMIs) are random oligonucleotide barcodes sequences? that are critical for the removal of PCR amplification biases within both bulk and single-cell sequencing experiments. However, the impact that PCR and sequencing errors have on the accuracy of generating absolute counts of RNA molecules is underappreciated. We demonstrate that PCR errors and not sequencing errors are the main source of inaccuracy in sequencing data and that the use of UMIs synthesized with homotrimeric nucleoside building blocks provides a solution to pinpoint and remove errors, allowing absolute counting of sequenced molecules.
Project description:Unique Molecular Identifiers (UMIs) are random oligonucleotide barcodes sequences? that are critical for the removal of PCR amplification biases within both bulk and single-cell sequencing experiments. However, the impact that PCR and sequencing errors have on the accuracy of generating absolute counts of RNA molecules is underappreciated. We demonstrate that PCR errors and not sequencing errors are the main source of inaccuracy in sequencing data and that the use of UMIs synthesized with homotrimeric nucleoside building blocks provides a solution to pinpoint and remove errors, allowing absolute counting of sequenced molecules.
Project description:Molecule counting is central to single-cell sequencing, yet no experimental strategy to evaluate counting performance exist. Here, we introduce RNA spike-ins containing inbuilt unique molecular identifiers (molecular spikes) that we use to monitor single-cell RNA counting performance across methods and to identify experimental steps essential for accurate counting. In this dataset, we add molecular spikes to popular single-cell RNA-seq protocols: SCRB-seq, Smart-seq3 and 10x Genomics (v2). For SCRB-seq and Smart-seq3, we also include variations of the library preparation procedure that are suspected to lead to changes in the UMI counting accuracy.
Project description:Several template DNA molecules with random base molecular barcodes were amplified and sequenced, and the efficacy of the random base barcode for digital counting was shown.
Project description:To analyse gene expression pattern in different disease state of COVID-19 patients. Experimental workflow: 1) Small RNA enrichment and purification, 2) Adaptor ligation and Unique molecular identifiers (UMI) labeled Primer addition, 3) RT-PCR, Library quantitation and pooling cyclization, 4) Library quality control, 5) Small RNAs were sequenced by BGI500 platform with 50bp single-end reads resulting in at least 20M reads for each sample. Analysis steps: 1) Small RNA raw sequencing reads with low quality tags (which have more than four bases whose quality is less than ten, or have more than six bases with a quality less than thirteen.), the reads with poly A tags, and the tags without 3’ primer or tags shorter than 18nt were removed. 2) After data filtering, the clean reads were mapped to the reference genome and other sRNA database including miRbase, siRNA, piRNA and snoRNA using Bowtie2 (Langmead and Salzberg, 2012). Particularly, cmsearch (Nawrocki and Eddy, 2013) was performed for Rfam mapping. 3) The small RNA expression level was calculated by counting absolute numbers of molecules using unique molecular identifiers (UMI, 8-10nt). MiRNA with UMI count lager than 1 in at least one sample were considered as expressed.
Project description:Large-scale sequencing of RNAs from individual cells can reveal patterns of gene, isoform and allelic expression across cell types and states. However, current single-cell RNA-sequencing (scRNA-seq) methods have limited ability to count RNAs at allele- and isoform resolution, and long-read sequencing techniques lack the depth required for large-scale applications across cells. Here, we introduce Smart-seq3 that combines full-length transcriptome coverage with a 5’ unique molecular identifier (UMI) RNA counting strategy that enabled in silico reconstruction of thousands of RNA molecules per cell. Importantly, a large portion of counted and reconstructed RNA molecules could be directly assigned to specific isoforms and allelic origin, and we identified significant transcript isoform regulation in mouse strains and human cell types. Moreover, Smart-seq3 showed a dramatic increase in sensitivity and typically detected thousands more genes per cell than Smart-seq2. Altogether, we developed a short-read sequencing strategy for single-cell RNA counting at isoform and allele-resolution applicable to large-scale characterization of cell types and states across tissues and organisms.