Project description:BackgroundSingle-cell RNA sequencing experiments commonly use 10x Genomics (10x) kits due to their high-throughput capacity and standardized protocols. Recently, Parse Biosciences (Parse) introduced an alternative technology that uses multiple in-situ barcoding rounds within standard 96-well plates. Parse enables the analysis of more cells from multiple samples in a single run without the need for additional reagents or specialized microfluidics equipment. To evaluate the performance of both platforms, we conducted a benchmark study using biological and technical replicates of mouse thymus as a complex immune tissue.ResultsWe found that Parse detected nearly twice the number of genes compared to 10x, with each platform detecting a distinct set of genes. The comparison of multiplexed samples generated from 10x and Parse techniques showed 10x data to have lower technical variability and more precise annotation of biological states in the thymus compared to Parse.ConclusionOur results provide a comprehensive comparison of the suitability of both single-cell platforms for immunological studies.
Project description:Age prediction based on single cell RNA-Sequencing data (scRNA-Seq) can provide information for patients' susceptibility to various diseases and conditions. In addition, such analysis can be used to identify aging related genes and pathways. To enable age prediction based on scRNA-Seq data, we developed PolyEN, a new regression model which learns continuous representation for expression over time. These representations are then used by PolyEN to integrate genes to predict an age. Existing and new lung aging data we profiled demonstrated PolyEN's improved performance over existing methods for age prediction. Our results identified lung epithelial cells as the most significant predictors for non-smokers while lung endothelial cells led to the best chronological age prediction results for smokers.
Project description:Despite the growing availability of sophisticated bioinformatic methods for the analysis of single-cell RNA-seq data, few tools exist that allow biologists without extensive bioinformatic expertise to directly visualize and interact with their own data and results. Here, we present Cerebro (cell report browser), a Shiny- and Electron-based standalone desktop application for macOS and Windows which allows investigation and inspection of pre-processed single-cell transcriptomics data without requiring bioinformatic experience of the user. Through an interactive and intuitive graphical interface, users can (i) explore similarities and heterogeneity between samples and cell clusters in two-dimensional or three-dimensional projections such as t-SNE or UMAP, (ii) display the expression level of single genes or gene sets of interest, (iii) browse tables of most expressed genes and marker genes for each sample and cluster and (iv) display trajectories calculated with Monocle 2. We provide three examples prepared from publicly available datasets to show how Cerebro can be used and which are its capabilities. Through a focus on flexibility and direct access to data and results, we think Cerebro offers a collaborative framework for bioinformaticians and experimental biologists that facilitates effective interaction to shorten the gap between analysis and interpretation of the data.Availability and implementationThe Cerebro application, additional documentation, and example datasets are available at https://github.com/romanhaa/Cerebro. Similarly, the cerebroApp R package is available at https://github.com/romanhaa/cerebroApp. All components are released under the MIT License.Supplementary informationSupplementary data are available at Bioinformatics online.
Project description:The linked read sequencing library preparation platform by 10X Genomics produces barcoded sequencing libraries, which are subsequently sequenced using the Illumina short read sequencing technology. In this new approach, long fragments of DNA are partitioned into separate micro-reactions, where the same index sequence is incorporated into each of the sequencing fragment inserts derived from a given long fragment. In this study, we exploited this property by using reads from index sequences associated with a large number of reads, to assemble the chloroplast genome of the Sitka spruce tree (Picea sitchensis). Here we report on the first Sitka spruce chloroplast genome assembled exclusively from P. sitchensis genomic libraries prepared using the 10X Genomics protocol. We show that the resulting 124,049 base pair long genome shares high sequence similarity with the related white spruce and Norway spruce chloroplast genomes, but diverges substantially from a previously published P. sitchensis- P. thunbergii chimeric genome. The use of reads from high-frequency indices enabled separation of the nuclear genome reads from that of the chloroplast, which resulted in the simplification of the de Bruijn graphs used at the various stages of assembly.
Project description:BackgroundSingle-cell RNA sequencing (scRNA-seq) has emerged has a main strategy to study transcriptional activity at the cellular level. Clustering analysis is routinely performed on scRNA-seq data to explore, recognize or discover underlying cell identities. The high dimensionality of scRNA-seq data and its significant sparsity accentuated by frequent dropout events, introducing false zero count observations, make the clustering analysis computationally challenging. Even though multiple scRNA-seq clustering techniques have been proposed, there is no consensus on the best performing approach. On a parallel research track, self-supervised contrastive learning recently achieved state-of-the-art results on images clustering and, subsequently, image classification.ResultsWe propose contrastive-sc, a new unsupervised learning method for scRNA-seq data that perform cell clustering. The method consists of two consecutive phases: first, an artificial neural network learns an embedding for each cell through a representation training phase. The embedding is then clustered in the second phase with a general clustering algorithm (i.e. KMeans or Leiden community detection). The proposed representation training phase is a new adaptation of the self-supervised contrastive learning framework, initially proposed for image processing, to scRNA-seq data. contrastive-sc has been compared with ten state-of-the-art techniques. A broad experimental study has been conducted on both simulated and real-world datasets, assessing multiple external and internal clustering performance metrics (i.e. ARI, NMI, Silhouette, Calinski scores). Our experimental analysis shows that constastive-sc compares favorably with state-of-the-art methods on both simulated and real-world datasets.ConclusionOn average, our method identifies well-defined clusters in close agreement with ground truth annotations. Our method is computationally efficient, being fast to train and having a limited memory footprint. contrastive-sc maintains good performance when only a fraction of input cells is provided and is robust to changes in hyperparameters or network architecture. The decoupling between the creation of the embedding and the clustering phase allows the flexibility to choose a suitable clustering algorithm (i.e. KMeans when the number of expected clusters is known, Leiden otherwise) or to integrate the embedding with other existing techniques.
Project description:Our understanding of miRNA activity at cellular resolution is thwarted by the inability of standard scRNA-seq protocols to capture miRNAs. We introduce a novel tool, miRSCAPE, to infer miRNA expression in a sample from its RNA-seq profile. We establish miRSCAPE's accuracy in 10 tumor and normal cohorts demonstrating its superiority over alternatives. miRSCAPE accurately infers cell type-specific miRNA activities (predicted versus observed fold-difference correlation ∼0.81) in two independent scRNA-seq datasets. We apply miRSCAPE to infer miRNA activities in scRNA clusters in pancreatic and lung adenocarcinomas, as well as in 56 cell types in the human cell landscape (HCL). In pancreatic and breast cancer scRNA-seq data, miRSCAPE recapitulates miRNAs associated with stemness and epithelial-mesenchymal transition (EMT) cell states, respectively. Overall, miRSCAPE recapitulates and refines miRNA biology at cellular resolution. miRSCAPE is freely available and is easily applicable to scRNA-seq data to infer miRNA activities at cellular resolution.
Project description:Coronavirus disease 2019 (COVID-19) threatens public health all over the world. It is well-accepted that the immune cells in peripheral blood are widely involved in the pathological process of COVID-19. However, hematopoietic stem and progenitor cells (HSPCs), as the main source of peripheral immune cells, have not been well studied during COVID-19 infection. We comprehensively revealed the transcriptome changes of peripheral blood HSPCs after COVID-19 infection and vaccination by single-cell RNA-seq. Compared with healthy individuals, the proportion of HSPCs in COVID-19 patients significantly increased. The increase in the proportion of HSPCs might be partly attributed to the enhancement of the HSPCs proliferation upon COVID-19 infection. However, the stemness damage of HSPCs is reflected by the decrease of differentiation signal, which can be used as a potential specific indicator of the severity and duration of COVID-19 infection. Type I interferon (IFN-I) and translation signals in HSPCs were mostly activated and inhibited after COVID-19 infection, respectively. In addition, the response of COVID-19 vaccination to the body is mild, while the secondary vaccination strengthens the immune response of primary vaccination. In conclusion, our study provides new insights into understanding the immune mechanism of COVID-19 infection.
Project description:High dimensionality and noise have limited the new biological insights that can be discovered in scRNA-seq data. While dimensionality reduction tools have been developed to extract biological signals from the data, they often require manual determination of signal dimension, introducing user bias. Furthermore, a common data preprocessing method, log normalization, can unintentionally distort signals in the data. Here, we develop scLENS, a dimensionality reduction tool that circumvents the long-standing issues of signal distortion and manual input. Specifically, we identify the primary cause of signal distortion during log normalization and effectively address it by uniformizing cell vector lengths with L2 normalization. Furthermore, we utilize random matrix theory-based noise filtering and a signal robustness test to enable data-driven determination of the threshold for signal dimensions. Our method outperforms 11 widely used dimensionality reduction tools and performs particularly well for challenging scRNA-seq datasets with high sparsity and variability. To facilitate the use of scLENS, we provide a user-friendly package that automates accurate signal detection of scRNA-seq data without manual time-consuming tuning.
Project description:Droplet based single cell transcriptomics has recently enabled parallel screening of tens of thousands of single cells. Clustering methods that scale for such high dimensional data without compromising accuracy are scarce. We exploit Locality Sensitive Hashing, an approximate nearest neighbour search technique to develop a de novo clustering algorithm for large-scale single cell data. On a number of real datasets, dropClust outperformed the existing best practice methods in terms of execution time, clustering accuracy and detectability of minor cell sub-types.