Single-cell RNA-seq data of mammary gland epithelial cells from different gestational stages to detect and remove barcode swapping
ABSTRACT: Barcode swapping results in the mislabeling of sequencing reads between multiplexed samples on the new patterned flow cell Illumina sequencing machines. This may compromise the validity of numerous genomic assays, especially for single-cell studies where many samples are routinely multiplexed together. The severity and consequences of barcode swapping for single-cell transcriptomic studies remain poorly understood. We have used two statistical approaches to robustly quantify the fraction of swapped reads in each of two plate-based single-cell RNA sequencing datasets. We found that approximately 2.5% of reads were mislabeled between samples on the HiSeq 4000 machine, which is lower than previous reports. We observed no correlation between the swapped fraction of reads and the concentration of free barcode across plates. Further- more, we have demonstrated that barcode swapping may generate complex but artefactual cell libraries in droplet-based single-cell RNA sequencing studies. To eliminate these artefacts, we have developed an algorithm to exclude individual molecules that have swapped between samples in 10X Genomics experiments, exploiting the combinatorial complexity present in the data. This permits the continued use of cutting-edge sequencing machines for droplet-based experiments while avoiding the confounding effects of barcode swapping. This data repository contains the sequencing files associated with the droplet based scRNA-seq dataset in Griffiths et al. (2018). The data presented here should purely used for technical analysis, the biological motivation is nonetheless briefly described in the following: The mammary gland is a unique organ as it undergoes most of its development during puberty and adulthood. Characterising the hierarchy of the various mammary epithelial cells and how they are regulated in response to gestation, lactation and involution is important for understanding how breast cancer develops. Recent studies have used numerous markers to enrich, isolate and characterise the different epithelial cell compartments within the adult mammary gland. However, in all of these studies only a handful of markers were used to define and trace cell populations. Therefore, there is a need for an unbiased and comprehensive description of mammary epithelial cells within the gland at different developmental stages. To this end we used single cell RNA sequencing (scRNAseq) to determine the gene expression profile of individual mammary epithelial cells across four adult developmental stages; nulliparous, mid gestation, lactation and post weaning (full natural involution).
Project description:Single-cell transcriptome profiling using a 3' droplet-based platform (Chromium,10x Genomics) of CD11b+ cells isolated from the spleen of control and tumor-bearing mice, treated or not with IFN gene therapy.
Project description:Single-cell transcriptome profiling using a 3' droplet-based platform (Chromium,10x Genomics) of human CD45+ leukocytes isolated from leukemic HuSGM3 mice infused with CD19.28z CAR-T cells, two days after cytokine release syndrome (CRS) onset and 5 days later.
Project description:We combined CRISPR genome editing with single-cell RNA sequencing to assess complex phenotypes in pooled cellular screens. Our method for CRISPR droplet sequencing (CROP-seq) comprises four key components: a gRNA vector that makes individual gRNAs detectable in single-cell transcriptomes, a high-throughput assay for single-cell RNA-seq, a computational pipeline for assigning single-cell transcriptomes to gRNAs, and a bioinformatic method for analyzing and interpreting gRNA-induced transcriptional profiles. CROP-seq allowed us to link gRNA expression to the associated transcriptome responses in thousands of single cells using a straightforward and broadly applicable screening workflow. Additional information are available from the CROP-seq website http://crop-seq.computational-epigenetics.org Overall design: Drop-seq species mixing experiment was performed with human HEK293T and mouse 3T3 cells in a 1:1 proportion as described by Macosko et al. For CROP-seq, Jurkat cells were transduced with a gRNA library targeting high-level regulators of T cell receptor signaling and a set of transcription factors. After 10 days of antibiotic selection and expansion, cells were stimulated with anti-CD3 and anti-CD28 antibodies or left untreated. Both conditions were analyzed using CROP-seq, measuring TCR activation for each gene knockout. Our dataset comprises 5,905 high-quality single-cell transcriptomes with uniquely assigned gRNAs. All CROP-seq raw data files are multiplexed with single-cell reads. Each read 1 contains the cell barcode (12 bp) and a molecule barcode (8 bp) and read 2 contains the transcriptome read. The libraries are pooled by nature but also intrinsically labelled. The file CROP-seq_Jurkat_TCR.digital_expression.csv.gz contains gene level expression quantifications of each gene for each cell which corresponds to the cell barcode in read1. For the Drop-seq_HEK293T-3T3 sample (Drop-seq species mixing), reads aligning to two genomes were used to quantify for each cell barcode the amount of reads coming from each genome. In a similar way, in the CROP-seq_HEK293T sample (CROP-seq gRNA mixing), the number of gRNA molecules detected per cell barcode (which is possible due to the polyadenylation of these gRNA-containing transcripts when expressed from a Pol2 promoter as engineered) were counted.
Project description:Adipose tissue in the mammary gland undergoes dramatic remodeling during reproduction. Adipocytes are replaced by mammary alveolar structures during pregnancy and lactation, then reappear upon weaning. Here, we reveal that adipocytes in the mammary gland de-differentiate into Pdgfrα+ preadipocyte- and fibroblast-like cells during pregnancy, and remain de-differentiated during lactation. Upon weaning, de-differentiated fibroblasts proliferate and re-differentiate into adipocytes. In order to determine the molecular signature of these de-differentiated adipocytes in the mammary gland, we compared these cells with classical adipocytes. Using the AdipoChaser-mT/mG system, we pre-labeled mature adipocytes with GFP expression to characterize the features of these de-differentiated adipocytes (Figure 4A), and then purified CD31-/CD45-/PDGFRα+/Tomato+ and CD31-/CD45-/PDGFRα+/GFP+ cells from the stromal vascular fraction (SVF) of lactating mammary gland at the peak of lactation through FACS. Gene expression analyses showed that the CD31-/CD45-/PDGFRα+/Tomato+ cells were indeed enriched with Tomato expression, while the CD31-/CD45-/PDGFRα+/GFP+ cells were enriched with GFP expression (Figure 4C). We then collected CD31-/CD45-/PDGFRα+/GFP+ cells as single cells for subsequent single cell RNA-sequencing analysis (Figure 4D-G, Supplemental. Figure S1A-G). After the flow sorting and single cell RNA amplification, 26 CD31-/CD45-/PDGFRα+/GFP+ cells passed the quality control, and these cells were used for single-cell RNA-sequencing analysis. Due to technical difficulties in sorting single mature white adipocyte through flow cytometry, adipocytes differentiated from the immortalized murine-derived brown pre-adipocyte cell line were used as mature adipocyte control (Pradhan et al., 2017). Additionally, we also included population RNA-seq experiments, i.e. three mature white adipocyte samples, two GFP+, and six GFP- ones.
Project description:We use RNA-sequencing to generate gene expression profiles of fetal mammary cells with unique sorting strategies. These analyses reveal that sorting fetal mammary cells with Sox10 and EpCAM sorting markers provides a stroma-free fMaSC-enriched cell population. The gene expression profiling of these cells offers a resources to probe the molecular mechanisms that specify this unique cell state. Examination of 2 different sorting strategies for fetal mammary cells
2015-08-04 | E-GEOD-71635 | ArrayExpress
Project description:Droplet Barcode Sequencing for Targeted Linked-Read Haplotyping of Single DNA Molecules
Project description:Here we compare the performance of these three approaches (inDrop, Drop-seq and 10x) using the same kind of sample with a unified data processing pipeline. We generated 2-3 replicates for each method using lymphoblastoid cell line GM12891. The average sequencing depth was around 50-60k reads per cell barcode. We also developed a versatile and rapid data processing workflow and applied it for all datasets. Cell capture efficiency, effective read ratio, barcode detection error and transcript detection sensitivity were analyzed as well. Overall design: We used a human lymphoblastoid cell line GM12891 assuming homogeneous within the cell population throughout the experiments. Biological replicates were setup for all three methods, inDrop, Drop-seq and 10X Genomics Chromium (10X), with various cell inputs in different days and batches
Project description:We describe an application of deep sequencing and de novo assembly of short RNA reads to investigate small interfering (si)RNAs mediated immunity in leaf samples from eight tree taxa naturally occurring in Wytham Woods, Oxfordshire, UK. BLAST search for homologues of contigs in the GenBank identified siRNA populations against a number of RNA viruses and a Ty1-copia retrotransposons in these tree species. Small RNA sequencing and de novo assembly
Project description:Recent advances in RNA sequencing (RNA-Seq) have enabled the discovery of novel transcriptomic variations that are not possible with traditional microarray-based methods. Tissue and cell specific transcriptome changes during pathophysiological stress, in disease cases versus controls and in response to therapies are of particular interest to investigators studying cardiometabolic diseases. Thus, knowledge on the relationships between sequencing depth and detection of transcriptomic variation is needed for designing RNA-Seq experiments and for interpreting results of analyses. Using deeply sequenced RNA-Seq data derived from adipose of a healthy individual before and after systemic administration of endotoxin (LPS), we investigated the sequencing depths needed for studies of gene expression and alternative splicing (AS). We found that to detect expressed genes and AS events, ~100 million (M) filtered reads were needed. However, the requirement on sequencing depth for the detection of LPS modulated differential expression (DE) and differential alternative splicing (DAS) was much higher. To detect 80% of events, ~300M filtered reads were needed for DE analysis whereas at least 400M filtered reads were necessary for detecting DAS. Although the majority of expressed genes and AS events can be detected with modest sequencing depths (~100M filtered reads), the estimated gene expression levels and exon/intron inclusion levels were less accurate. We report the first study that evaluates the relationship between RNA-Seq depth and the ability to detect DE and DAS in human adipose. Our results suggest that a much higher sequencing depth is needed to reliably identify DAS events than for DE genes. Random sampling the RNA-seq data in different depth for gene and alternative-splicing analysis