Project description:Droplet-based single-cell sequencing techniques have provided unprecedented insight into cellular heterogeneities within tissues. However, these approaches only allow for the measurement of the distal parts of a transcript following short-read sequencing. Therefore, splicing and sequence diversity information is lost for the majority of the transcript. The application of long-read Nanopore sequencing to droplet-based methods is challenging because of the low base-calling accuracy currently associated with Nanopore sequencing. Although several approaches that use additional short-read sequencing to error-correct the barcode and UMI sequences have been developed, these techniques are limited by the requirement to sequence a library using both short- and long-read sequencing. Here we introduce a novel approach termed single-cell Barcode UMI Correction sequencing (scBUC-seq) to efficiently error-correct barcode and UMI oligonucleotide sequences synthesized by using blocks of dimeric nucleotides. The method can be applied to correct both short-read and long-read sequencing, thereby allowing users to recover more reads per cell that permits direct single-cell Nanopore sequencing for the first time. We illustrate our method by using species-mixing experiments to evaluate barcode assignment accuracy and multiple myeloma cell lines to evaluate differential isoform usage and Ewing’s sarcoma cells to demonstrate Ig fusion transcript analysis.
Project description:Barcode swapping results in the mislabeling of sequencing reads between multiplexed samples on the new patterned flow cell Illumina sequencing machines. This may compromise the validity of numerous genomic assays, especially for single-cell studies where many samples are routinely multiplexed together. The severity and consequences of barcode swapping for single-cell transcriptomic studies remain poorly understood. We have used two statistical approaches to robustly quantify the fraction of swapped reads in each of two plate-based single-cell RNA sequencing datasets. We found that approximately 2.5% of reads were mislabeled between samples on the HiSeq 4000 machine, which is lower than previous reports. We observed no correlation between the swapped fraction of reads and the concentration of free barcode across plates. Further- more, we have demonstrated that barcode swapping may generate complex but artefactual cell libraries in droplet-based single-cell RNA sequencing studies. To eliminate these artefacts, we have developed an algorithm to exclude individual molecules that have swapped between samples in 10X Genomics experiments, exploiting the combinatorial complexity present in the data. This permits the continued use of cutting-edge sequencing machines for droplet-based experiments while avoiding the confounding effects of barcode swapping. This data repository contains the sequencing files associated with the droplet based scRNA-seq dataset in Griffiths et al. (2018). The data presented here should purely used for technical analysis, the biological motivation is nonetheless briefly described in the following: The mammary gland is a unique organ as it undergoes most of its development during puberty and adulthood. Characterising the hierarchy of the various mammary epithelial cells and how they are regulated in response to gestation, lactation and involution is important for understanding how breast cancer develops. Recent studies have used numerous markers to enrich, isolate and characterise the different epithelial cell compartments within the adult mammary gland. However, in all of these studies only a handful of markers were used to define and trace cell populations. Therefore, there is a need for an unbiased and comprehensive description of mammary epithelial cells within the gland at different developmental stages. To this end we used single cell RNA sequencing (scRNAseq) to determine the gene expression profile of individual mammary epithelial cells across four adult developmental stages; nulliparous, mid gestation, lactation and post weaning (full natural involution).
Project description:Single-cell transcriptomics, reliant on the incorporation of barcodes and unique molecular identifiers (UMIs) into captured polyA+ mRNA, faces a significant challenge due to synthesis errors in oligonucleotide capture sequences. These inaccuracies, which are especially problematic in long-read sequencing, impair the precise identification of sequences and result in inaccuracies in UMI deduplication. To mitigate this issue, we have modified the oligonucleotide capture design, which integrates an interposed anchor between the barcode and UMI, and a 'V' base anchor adjacent to the polyA capture region. This configuration is devised to ensure compatibility with both short and long-read sequencing technologies, facilitating improved UMI recovery and enhanced feature detection, thereby improving the efficacy of droplet-based sequencing methods.
Project description:Single-cell whole-genome haplotyping allows simultaneous detection of haplotypes associated with monogenic diseases, chromosome copy-numbering and subsequently, has revealed mosaicism in embryos and embryonic stem cells. Methods, such as karyomapping and haplarithmisis, were deployed as a generic and genome-wide approach for preimplantation genetic testing (PGT) and are replacing traditional PGT methods. While current methods primarily rely on SNP array, we envision sequencing-based methods to become more accessible and cost-efficient. Here, we developed a novel sequencing-based methodology to haplotype and copy-number profile single cells. Following DNA amplification, genomic size and complexity is reduced through restriction enzyme digestion and DNA is genotyped through sequencing. This single-cell genotyping-by-sequencing (scGBS) is the input for haplarithmisis, an algorithm we previously developed for SNP array-based single-cell haplotyping. We established technical parameters and developed an analysis pipeline enabling accurate concurrent haplotyping and copy-number profiling of single cells. We demonstrate its value in human blastomere and trophectoderm samples as application for PGT for monogenic disorders. Furthermore, we demonstrate the method to work in other species through analyzing blastomeres of bovine embryos. Our scGBS method opens up the path for single-cell haplotyping of any species with diploid genomes and could make its way into the clinic as a PGT application.
Project description:Single cell transcriptomics has emerged as a powerful approach to dissecting phenotypic heterogeneity in complex, unsynchronized cellular populations. However, many important biological questions demand quantitative analysis of large numbers of individual cells. Hence, new tools are urgently needed for efficient, inexpensive, and parallel manipulation of RNA from individual cells. We report a simple microfluidic platform for trapping single cell lysates in sealed, picoliter microwells capable of “printing” RNA on glass or capturing RNA on polymer beads. To demonstrate the utility of our system for single cell transcriptomics, we developed a highly scalable technology for genome-wide, single cell RNA-Seq. The current implementation of our device is pipette-operated, profiles hundreds of individual cells in parallel with library preparation costs of ~$0.10-$0.20/cell, and includes five lanes for simultaneous experiments. We anticipate that this system will ultimately serve as a general platform for large-scale single cell transcriptomics, compatible with both imaging and sequencing readouts.!Series_type = Expression profiling by high throughput sequencing A microfluidic device that pairs sequence-barcoded mRNA capture beads with individual cells was used to barcode cDNA from individual cells which was then pre-amplified by in vitro transcription in a pool and converted into an Illumina RNA-Seq library. Libraries were generated from ~600 individual cells in parallel and extensive analysis was done on 396 cells from the U87 and MCF10a cell lines and from ~500 individual cells with extensive analysis on 247 cells from the U87 and WI-38 cell lines. Sequencing was done on the 3'-end of the transcript molecules. The first read contains cell-identifying barcodes that were present on the capture bead and the second read contains a unique molecular identifier (UMI) barcode, a lane-identifying barcode, and then the sequence of the transcript.
Project description:Barcode-based multiplexing methods can be used to increase throughput and reduce batch effects in large single-cell genomics studies. To evaluate methods for demultiplexing barcode-multiplexed data, we generated a dataset by labeling samples separately with barcode-tagged antibodies, mixing those samples, and progressively overloading a droplet-based scRNA-seq system.
2021-12-01 | GSE181862 | GEO
Project description:Xdrop: Targeted sequencing of long DNA molecules from low input samples using droplet sorting
Project description:<p>In this study, linked read sequencing was performed on two ovarian metastases and matched normal tissue, from a patient with primary diffuse gastric cancer. Linked read sequencing is a DNA preparation technology whereby high molecular weight molecules of DNA are uniquely barcoded prior to fragmentation and sequencing, thus retaining information about genomic contiguity. This study performed an extended analysis of linked read sequencing data to resolve the complex structures of structural variants in the cancer genomes. Complex structural rearrangements were identified in the genomic region surrounding the known oncogene FGFR2, and the association between FGFR2 and gastric cancer metastasis was demonstrated in an organoid model. </p>
Project description:DNA barcodes can be used to identify single cells in a sequencing data space while optical codes can be used to track single live cells in an image data space. We have developed dual image and DNA (ID)-coding, which identifies individual single cells in both live image and sequencing data spaces. Samples provided here are relevant to proof-of-concept studies of ID-coding presented in the associated publication. DNA barcoded micro-particles were encapsulated in hydrogel droplets with or without single cells. The hydrogel droplets were then subjected to “single-droplet sequencing” where whole polyA-bearing nucleic acid components within a hydrogel droplet (i.e. mRNA from cells and synthetic DNA on beads) were concatenated by the same cell barcodes.
Project description:Transcription factors direct gene expression, and so there is much interest in mapping their genome-wide binding locations. M-BM- Current methods do not allow for the multiplexed analysis of TF binding, and this limits their throughput. We describe a novel method for determining the genomic target genes of multiple transcription factors simultaneously. DNA-binding proteins are endowed with the ability to direct transposon insertions into the genome near to where they bind. The transposon becomes a M-bM-^@M-^\Calling CardM-bM-^@M-^] marking the visit of the DNA-binding protein to that location. A unique sequence M-bM-^@M-^\barcodeM-bM-^@M-^] in the transposon matches it to the DNA-binding protein that directed its insertion. The sequences of the DNA flanking the transposon (which reveal where in the genome the transposon landed) and the barcode within the transposon (which identifies the TF that put it there) are determined by massively-parallel DNA sequencing. To demonstrate the methodM-bM-^@M-^Ys feasibility, we determined the genomic targets of eight transcription factors in a single experiment. The Calling Card method promises to significantly reduce the cost and labor needed to determine the genomic targets of many transcription factors in different environmental conditions and genetic backgrounds. These data contain Ty5 insertion sites mapped by an Illumina GAII analyzer in the S. cerevisiae genome for the background strain without any Sir4 present (1 run), in strains expressing Sir4-tagged copies of three well-characterized TFs: Gal4, Leu3, and Gcn4 (1 run each), and a multiplex of eight Sir4-tagged TFs pooled in a single experiment (2 biological replicates), and insertions from the Thi2-Sir4 fusion expressed from its native locus in two conditions (1 run each). The format of each insertions file is [chromosome number] [position of genomic base] [direction of insertion] [number of reads at that position]. Raw sequencing data comes in two varieties. Paired-end data contains a 5 bp barcode at the beginning of read #2. Single-end data contains a 2 bp barcode on the beggining of read #1.