Project description:Deconvolution models are a powerful tool for extracting cell type-specific information from bulk gene expression profiles. Current methods leverage advanced machine learning models and high-resolution sequencing, like single-cell RNA-sequencing (scRNA-seq), showing promising results across diverese tissues and conditions. However, they still present important limitations: Many depend on selecting a robust reference, which can strongly affect the deconvolution. Secondly, pseudobulk data used for training and real bulk RNA-seq samples often exhibit strong distribution shifts, which are currently unaccounted for. Finally, most deconvolution approaches behave as black boxes, which can compromise the reliability of the results. Here, we present Sweetwater, an adaptive and interpretable autoencoder that efficiently deconvolves bulk samples leveraging multiple classes of reference data. Moreover, we propose an improved way of generating training data from a mixture of FACS-sorted FASTQ files, reducing platform-specific biases and outperforming current single-cell-based references. Furthermore, we introduce a gold standard dataset to facilitate fair and accurate evaluation of deconvolution approaches. Finally, we demonstrate that Sweetwater adapts effectively to deconvolved samples during training, uncovering biologically meaningful patterns and enhancing result's reliability. Sweetwater is available at https://github.com/ML4BM-Lab/Sweetwater, and we anticipate it will expedite the accurate examination of high-throughput clinical data across diverse applications.
Project description:Single-cell RNA-sequencing has become a powerful tool to study biologically significant characteristics at explicitly high resolution. However, its application on emerging data is currently limited by its intrinsic techniques. Here, we introduce Tissue-AdaPtive autoEncoder (TAPE), a deep learning method connecting bulk RNA-seq and single-cell RNA-seq to achieve precise deconvolution in a short time. By constructing an interpretable decoder and training under a unique scheme, TAPE can predict cell-type fractions and cell-type-specific gene expression tissue-adaptively. Compared with popular methods on several datasets, TAPE has a better overall performance and comparable accuracy at cell type level. Additionally, it is more robust among different cell types, faster, and sensitive to provide biologically meaningful predictions. Moreover, through the analysis of clinical data, TAPE shows its ability to predict cell-type-specific gene expression profiles with biological significance. We believe that TAPE will enable and accelerate the precise analysis of high-throughput clinical data in a wide range.
Project description:Numerous multi-omic investigations of cancer tissue have documented varying and poor pairwise transcript:protein quantitative correlations and most deconvolution tools aiming to predict cell type proportions (cell admixture) have been developed and credentialed using transcript-level data alone. To estimate cell admixture using protein abundance data, we analyzed proteome (and transcriptome data) generated from contrived admixtures of tumor, stroma, and immune cell models or those selectively harvested from the tissue microenvironment by laser microdissection from high grade serous ovarian cancer (HGSOC) tumors. Co-quantified transcripts and proteins performed similarly to estimate stroma and immune cell admixture in two commonly used deconvolution algorithms ESTIMATE and ConsensusTME (r ≥ 0.63). Here we have developed and optimized protein-based signatures to estimate cell admixture proportions and benchmarked these using bulk tumor proteomics data from over 150 HGSOC patients. The optimized protein signatures supporting cell type proportion estimates from bulk tissue proteomic data are available at https://lmdomics.org/ProteoMixture/.
Project description:Plasma cell-free DNA (cfDNA) is a noninvasive biomarker for cell death of all organs. Deciphering the tissue origin of cfDNA can reveal abnormal cell death because of diseases, which has great clinical potential in disease detection and monitoring. Despite the great promise, the sensitive and accurate quantification of tissue-derived cfDNA remains challenging to existing methods due to the limited characterization of tissue methylation and the reliance on unsupervised methods. To fully exploit the clinical potential of tissue-derived cfDNA, here we present one of the largest comprehensive and high-resolution methylation atlas based on 521 noncancer tissue samples spanning 29 major types of human tissues. We systematically identified fragment-level tissue-specific methylation patterns and extensively validated them in orthogonal datasets. Based on the rich tissue methylation atlas, we develop the first supervised tissue deconvolution approach, a deep-learning-powered model, cfSort, for sensitive and accurate tissue deconvolution in cfDNA. On the benchmarking data, cfSort showed superior sensitivity and accuracy compared to the existing methods. We further demonstrated the clinical utilities of cfSort with two potential applications: aiding disease diagnosis and monitoring treatment side effects. The tissue-derived cfDNA fraction estimated from cfSort reflected the clinical outcomes of the patients. In summary, the tissue methylation atlas and cfSort enhanced the performance of tissue deconvolution in cfDNA, thus facilitating cfDNA-based disease detection and longitudinal treatment monitoring.
Project description:Bulk deconvolution with single-cell/nucleus RNA-seq data is critical for understanding heterogeneity in complex biological samples, yet the technological discrepancy across sequencing platforms limits deconvolution accuracy. To address this, we introduce an experimental design to match inter-platform biological signals, hence revealing the technological discrepancy, and then develop a deconvolution framework called DeMixSC using the better-matched, i.e., benchmark, data. Built upon a novel weighted nonnegative least-squares framework, DeMixSC identifies and adjusts genes with high technological discrepancy and aligns the benchmark data with large patient cohorts of matched-tissue-type for large-scale deconvolution. Our results using a benchmark dataset of healthy retinas suggest much-improved deconvolution accuracy. Further analysis of a cohort of 453 patients with age-related macular degeneration supports the broad applicability of DeMixSC. Our findings reveal the impact of technological discrepancy on deconvolution performance and underscore the importance of a well-matched dataset to resolve this challenge. The developed DeMixSC framework is generally applicable for deconvolving large cohorts of disease tissues, and potentially cancer.
Project description:Deconvolution is a methodology for estimating the immune cell proportions from the transcriptome. It is mainly applied to blood-derived samples and tumor tissues. However, the influence of tissue-specific modeling on the estimation results has rarely been investigated. In this study, we constructed a system to evaluate the performance of the deconvolution method on liver transcriptome data. Correspondence: Tadahaya Mizuno
Project description:The project aimed to investigate the possibility to use proteomics data to deconvolute cell line proportions in mixed samples. Samples containing either HEK 293, Caco-2, or A549 cells and mixtures of the three cell lines was analysed using the total protein approach. This was then used for proteomics informed deconvolution. The results show that proteome deconvolution provides an effective tool for investigating cellular composition in mixed samples. This was later applied also to in silico mixtures of primary human liver cells and liver tissue. However, those data are presented elsewhere.
Project description:Accelerating development of complex in vitro models (CIVMs) drives a need for additional methods to characterize these systems for use in toxicology and drug development. Relative to traditional cell culture, CIVMs may use a lower number of cells, are cultured for longer periods of time, and typically involve engineered 3-D cell organization. Standard single-cell assessment tools are not conducive to these types of complex models due to cell isolation stress, cell loss, and high cost. In this report, we benchmark RNA-seq deconvolution, which utilizes publicly available scRNA-seq datasets to predict cell proportions from bulk RNA-seq data derived from two CIVMs: a stem-cell-based human intestine epithelia model and a neonatal rodent testis model. We consider the impact of multiple imputation methods for scRNA-seq to restore the gene distribution of the original tissue to generate a better cell type signature and multiple deconvolution methods. The accuracy of deconvolution methods varied significantly in our analyses but provided valuable information on the emergence of an enterocyte cell population from the LGR5+ crypt stem cells following differentiation. In the testis model, deconvolution indicated that a small population of germ cells were retained over time, and that cell type estimates remained stable with physiologically relevant hormone stimulation. In our analysis, using imputed single cell references improved deconvolution accuracy. Deconvolution can be a useful tool for novel CIVM characterization, especially with rapidly growing libraries of single-cell data across tissues and developmental time.