Project description:Single-cell RNA sequencing (scRNA-seq) data are noisy and sparse. Here, we show that transfer learning across datasets remarkably improves data quality. By coupling a deep autoencoder with a Bayesian model, SAVER-X extracts transferable gene-gene relationships across data from different labs, varying conditions and divergent species, to denoise new target datasets.
Project description:Since many single-cell RNA-seq (scRNA-seq) data are obtained after cell sorting, such as when investigating immune cells, tracking cellular landscape by integrating single-cell data with spatial transcriptomic data is limited due to cell type and cell composition mismatch between the two datasets. We developed a method, spSeudoMap, which utilizes sorted scRNA-seq data to create virtual cell mixtures that closely mimic the gene expression of spatial data and trains a domain adaptation model for predicting spatial cell compositions. The method was applied in brain and breast cancer tissues and accurately predicted the topography of cell subpopulations. spSeudoMap may help clarify the roles of a few, but crucial cell types.
Project description:Large single-cell atlases are now routinely generated to serve as references for analysis of smaller-scale studies. Yet learning from reference data is complicated by batch effects between datasets, limited availability of computational resources and sharing restrictions on raw data. Here we introduce a deep learning strategy for mapping query datasets on top of a reference called single-cell architectural surgery (scArches). scArches uses transfer learning and parameter optimization to enable efficient, decentralized, iterative reference building and contextualization of new datasets with existing references without sharing raw data. Using examples from mouse brain, pancreas, immune and whole-organism atlases, we show that scArches preserves biological state information while removing batch effects, despite using four orders of magnitude fewer parameters than de novo integration. scArches generalizes to multimodal reference mapping, allowing imputation of missing modalities. Finally, scArches retains coronavirus disease 2019 (COVID-19) disease variation when mapping to a healthy reference, enabling the discovery of disease-specific cell states. scArches will facilitate collaborative projects by enabling iterative construction, updating, sharing and efficient use of reference atlases.
Project description:Spatial transcriptomics (ST) technologies have revolutionized our understanding of cellular ecosystems. However, these technologies face challenges such as sparse gene signals and limited gene detection capacities, which hinder their ability to fully capture comprehensive spatial gene expression profiles. To address these limitations, we propose leveraging single-cell RNA sequencing (scRNA-seq), which provides comprehensive gene expression data but lacks spatial context, to enrich ST profiles. Herein, we introduce SpaIM, an innovative style transfer learning model that utilizes scRNA-seq information to predict unmeasured gene expressions in ST data, thereby improving gene coverage and expressions. SpaIM segregates scRNA-seq and ST data into data-agnostic contents and data-specific styles, with the contents capture the commonalities between the two data types, while the styles highlight their unique differences. By integrating the strengths of scRNA-seq and ST, SpaIM overcomes data sparsity and limited gene coverage issues, making significant advancements over 12 existing methods. This improvement is demonstrated across 53 diverse ST datasets, spanning sequencing- and imaging-based spatial technologies in various tissue types. Additionally, SpaIM enhances downstream analyses, including the detection of ligand-receptor interactions, spatial domain characterization, and identification of differentially expressed genes. Released as open-source software, SpaIM increases accessibility for spatial transcriptomics analysis. In summary, SpaIM represents a pioneering approach to enrich spatial transcriptomics using scRNA-seq data, enabling precise gene expression imputation and advancing the field of spatial transcriptomics research.
Project description:The development of single-cell sequencing technologies has allowed researchers to gain important new knowledge about the expression profile of genes in thousands of individual cells of a model organism or tissue. A common disadvantage of this technology is the loss of the three-dimensional (3-D) structure of the cells. Consequently, the Dialogue on Reverse Engineering Assessment and Methods (DREAM) organized the Single-Cell Transcriptomics Challenge, in which we participated, with the aim to address the following two problems: (a) to identify the top 60, 40, and 20 genes of the Drosophila melanogaster embryo that contain the most spatial information and (b) to reconstruct the 3-D arrangement of the embryo using information from those genes. We developed two independent techniques, leveraging machine learning models from least absolute shrinkage and selection operator (Lasso) and deep neural networks (NNs), which are applied to high-dimensional single-cell sequencing data in order to accurately identify genes that contain spatial information. Our first technique, Lasso.TopX, utilizes the Lasso and ranking statistics and allows a user to define a specific number of features they are interested in. The NN approach utilizes weak supervision for linear regression to accommodate for uncertain or probabilistic training labels. We show, individually for both techniques, that we are able to identify important, stable, and a user-defined number of genes containing the most spatial information. The results from both techniques achieve high performance when reconstructing spatial information in D. melanogaster and also generalize to zebrafish (Danio rerio). Furthermore, we identified novel D. melanogaster genes that carry important positional information and were not previously suspected. We also show how the indirect use of the full datasets' information can lead to data leakage and generate bias in overestimating the model's performance. Lastly, we discuss the applicability of our approaches to other feature selection problems outside the realm of single-cell sequencing and the importance of being able to handle probabilistic training labels. Our source code and detailed documentation are available at https://github.com/TJU-CMC-Org/SingleCell-DREAM/.
Project description:The field of spatial transcriptomics is rapidly expanding, and with it the repertoire of available technologies. However, several of the transcriptome-wide spatial assays do not operate on a single cell level, but rather produce data comprised of contributions from a - potentially heterogeneous - mixture of cells. Still, these techniques are attractive to use when examining complex tissue specimens with diverse cell populations, where complete expression profiles are required to properly capture their richness. Motivated by an interest to put gene expression into context and delineate the spatial arrangement of cell types within a tissue, we here present a model-based probabilistic method that uses single cell data to deconvolve the cell mixtures in spatial data. To illustrate the capacity of our method, we use data from different experimental platforms and spatially map cell types from the mouse brain and developmental heart, which arrange as expected.
Project description:Single-cell RNA sequencing (scRNA-seq) data provides unprecedented information on cell fate decisions; however, the spatial arrangement of cells is often lost. Several recent computational methods have been developed to impute spatial information onto a scRNA-seq dataset through analyzing known spatial expression patterns of a small subset of genes known as a reference atlas. However, there is a lack of comprehensive analysis of the accuracy, precision, and robustness of the mappings, along with the generalizability of these methods, which are often designed for specific systems. We present a system-adaptive deep learning-based method (DEEPsc) to impute spatial information onto a scRNA-seq dataset from a given spatial reference atlas. By introducing a comprehensive set of metrics that evaluate the spatial mapping methods, we compare DEEPsc with four existing methods on four biological systems. We find that while DEEPsc has comparable accuracy to other methods, an improved balance between precision and robustness is achieved. DEEPsc provides a data-adaptive tool to connect scRNA-seq datasets and spatial imaging datasets to analyze cell fate decisions. Our implementation with a uniform API can serve as a portal with access to all the methods investigated in this work for spatial exploration of cell fate decisions in scRNA-seq data. All methods evaluated in this work are implemented as an open-source software with a uniform interface.
Project description:Single-cell and spatial transcriptome sequencing, two recently optimized transcriptome sequencing methods, are increasingly used to study cancer and related diseases. Cell annotation, particularly for malignant cell annotation, is essential and crucial for in-depth analyses in these studies. However, current algorithms lack accuracy and generalization, making it difficult to consistently and rapidly infer malignant cells from pan-cancer data. To address this issue, we present Cancer-Finder, a domain generalization-based deep-learning algorithm that can rapidly identify malignant cells in single-cell data with an average accuracy of 95.16%. More importantly, by replacing the single-cell training data with spatial transcriptomic datasets, Cancer-Finder can accurately identify malignant spots on spatial slides. Applying Cancer-Finder to 5 clear cell renal cell carcinoma spatial transcriptomic samples, Cancer-Finder demonstrates a good ability to identify malignant spots and identifies a gene signature consisting of 10 genes that are significantly co-localized and enriched at the tumor-normal interface and have a strong correlation with the prognosis of clear cell renal cell carcinoma patients. In conclusion, Cancer-Finder is an efficient and extensible tool for malignant cell annotation.
Project description:MotivationSingle-cell RNA sequencing (scRNA-seq) enables high-throughput transcriptomic profiling at single-cell resolution. The inherent spatial location is crucial for understanding how single cells orchestrate multicellular functions and drive diseases. However, spatial information is often lost during tissue dissociation. Spatial transcriptomic (ST) technologies can provide precise spatial gene expression atlas, while their practicality is constrained by the number of genes they can assay or the associated costs at a larger scale and the fine-grained cell-type annotation. By transferring knowledge between scRNA-seq and ST data through cell correspondence learning, it is possible to recover the spatial properties inherent in scRNA-seq datasets.ResultsIn this study, we introduce COME, a COntrastive Mapping lEarning approach that learns mapping between ST and scRNA-seq data to recover the spatial information of scRNA-seq data. Extensive experiments demonstrate that the proposed COME method effectively captures precise cell-spot relationships and outperforms previous methods in recovering spatial location for scRNA-seq data. More importantly, our method is capable of precisely identifying biologically meaningful information within the data, such as the spatial structure of missing genes, spatial hierarchical patterns, and the cell-type compositions for each spot. These results indicate that the proposed COME method can help to understand the heterogeneity and activities among cells within tissue environments.Availability and implementationThe COME is freely available in GitHub (https://github.com/cindyway/COME).
Project description:SummaryThe limited resolution of spatial transcriptomics (ST) assays in the past has led to the development of cell type annotation methods that separate the convolved signal based on available external atlas data. In light of the rapidly increasing resolution of the ST assay technologies, we made available and investigated the performance of a deconvolution-free marker-based cell annotation method called scType. In contrast to existing methods, the spatial application of scType does not require computationally strenuous deconvolution, nor large single-cell reference atlases. We show that scType enables ultra-fast and accurate identification of abundant cell types from ST data, especially when a large enough panel of genes is detected. Examples of such assays are Visium and Slide-seq, which currently offer the best trade-off between high resolution and number of genes detected by the assay for cell type annotation.Availability and implementationscType source R and python codes for spatial data are openly available in GitHub (https://github.com/kris-nader/sp-type or https://github.com/kris-nader/sc-type-py). Step-by-step tutorials for R and python spatial data analysis can be found in https://github.com/kris-nader/sp-type and https://github.com/kris-nader/sc-type-py/blob/main/spatial_tutorial.md, respectively.