Browse
Submit Data
Databases
API
Help

Dataset Information

0 Views

0 Connections

0 Citations

0 Reanalyses

0 Downloads

Omics score: 0

An interpretable and adaptive autoencoder for efficient tissue deconvolution

ABSTRACT: Deconvolution models are a powerful tool for extracting cell type-specific information from bulk gene expression profiles. Current methods leverage advanced machine learning models and high-resolution sequencing, like single-cell RNA-sequencing (scRNA-seq), showing promising results across diverese tissues and conditions. However, they still present important limitations: Many depend on selecting a robust reference, which can strongly affect the deconvolution. Secondly, pseudobulk data used for training and real bulk RNA-seq samples often exhibit strong distribution shifts, which are currently unaccounted for. Finally, most deconvolution approaches behave as black boxes, which can compromise the reliability of the results. Here, we present Sweetwater, an adaptive and interpretable autoencoder that efficiently deconvolves bulk samples leveraging multiple classes of reference data. Moreover, we propose an improved way of generating training data from a mixture of FACS-sorted FASTQ files, reducing platform-specific biases and outperforming current single-cell-based references. Furthermore, we introduce a gold standard dataset to facilitate fair and accurate evaluation of deconvolution approaches. Finally, we demonstrate that Sweetwater adapts effectively to deconvolved samples during training, uncovering biologically meaningful patterns and enhancing result's reliability. Sweetwater is available at https://github.com/ML4BM-Lab/Sweetwater, and we anticipate it will expedite the accurate examination of high-throughput clinical data across diverse applications.

ORGANISM(S): Homo sapiens

PROVIDER: GSE297720 | GEO | 2025/09/10

REPOSITORIES: GEO

ACCESS DATA

Json Xml

Dataset's files

Source:

			Action	DRS
		Other

Items per page:

1 - 1 of 1

Similar Datasets

Expression profiles of lineage-depleted (Lin-) cell and mono-nucleated cell (MNC) samples derived from human umbilical cord blood

Project description:The cellular composition of heterogeneous samples can be predicted from reference gene expression profiles that represent the homogeneous, constituent populations of the heterogeneous samples. However, existing methods fail when the reference profiles are not representative of the constituent populations. We developed PERT, a new probabilistic expression deconvolution method, to address this limitation. PERT was used to deconvolve cellular composition of variably sourced and treated heterogeneous human blood samples. Our results indicate that even after correcting batch effects, cells presenting the same cell surface antigens display different transcriptional programs when they are uncultured versus culture-derived. Given gene expression profiles of culture-derived heterogeneous samples and profiles of uncultured reference populations, PERT was able to accurately recover proportions of pure populations composing the heterogeneous samples. We anticipate that PERT will be widely applicable to expression deconvolution problems using profiles from reference populations that vary from the corresponding constituent populations in cellular state but not cellular identity. Human umbilical cord blood-derived lineage negative cells and mononucleated cells Cellular compositions of mononucleated cell and lineage negative cell compartments were deconvolved based on the gene expression profiles

2012-09-13 | E-GEOD-40829 | biostudies-arrayexpress

Placental gene expression-based cell type deconvolution: Cell proportions drive preeclampsia gene expression differences

Project description:The placenta mediates adverse pregnancy outcomes, including preeclampsia, characterized by gestational hypertension and proteinuria. Placental cell type heterogeneity in preeclampsia is not well-understood and limits mechanistic interpretation of bulk gene expression measures. We generated single-cell RNA-sequencing samples for integration with existing data to create the largest deconvolution reference of 19 fetal and 8 maternal cell types from placental villous tissue at term. We deconvoluted eight published microarray case-control studies of preeclampsia. Our findings indicate substantial placental cellular heterogeneity in preeclampsia that predict previously observed bulk gene expression differences. Our deconvolution reference lays the groundwork for cellular heterogeneity-aware investigation into placental dysfunction and adverse birth outcomes.

2022-08-22 | GSE182381 | GEO

An interpretable and adaptive autoencoder for efficient tissue deconvolution

Project description:An interpretable and adaptive autoencoder for efficient tissue deconvolution

| PRJNA1265952 | ENA

SCDC: Deconvolution of Bulk Gene Expression by Single-Cell RNA Sequencing Data

Project description:This paper describes SCDC, a deconvolution method for bulk RNA-seq that leverages cell-type specific gene expression profiles from multiple scRNA-seq reference datasets.

2019-08-30 | GSE136148 | GEO

DNA-based immune profiling in murine blood with methylation cytometry deconvolution

Project description:Cell-type-specific patterns of DNA methylation have been leveraged to develop methods for accurate and reproducible DNA-based cell typing in human blood and other biospecimens. Recently developed and standardized genome -scale DNA methylation arrays for mus musculus offer an instrument for quantifying cellular composition with epigenetic signatures specific to each cell type. We present a novel murine immune cell deconvolution tool that leverages DNA methylation profiles FlowSorted.Blood.IlluminaMouseMethylation. Separately from BALB/c and C57BL/6 mice, we used flow cytometry to purify polymorphonuclear neutrophil (PMN), monocyte, B-lymphocyte, natural killer, and CD4+ and CD8+ T-cells. Genome-scale DNA methylation was measured with the Illumina MouseMethylation BeadChip array at >285,000 CpG loci in purified cells for reference profile development. For testing and validation, DNA methylation was measured in both eye bleed and terminal blood whole blood samples from four mouse strains. To benchmark and determine accuracy of reference libraries for DNA methylation cell type deconvolution, flow cytometry of independent whole blood samples from four mouse strains was used. The Identifying Optimal Libraries (IDOL) algorithm identified an optimal reference library of 300 CpGs for deconvolution with RMSE values across testing and validation samples of <1.7 for all cell types except B lymphocytes (RMSE<2.75). This methylation cytometry tool for mice offers the ability to conduct DNA-based immune profiling in archival samples and reduce confounding from cell type heterogeneity in molecular studies of whole blood samples.

2026-05-14 | GSE284254 | GEO

Generation of multi-omic datasets using high-throughput molecular profiling of DNA methylation human data

Project description:Tumor heterogeneity significantly affects cancer progression and therapeutic response, yet quantifying it from bulk molecular data remains challenging. Deconvolution algorithms, which estimate cell-type proportions in bulk samples, offer a potential solution. However, there is no consensus on the optimal algorithm for transcriptomic or methylomic data. Here, we present an unbiased evaluation framework for the first comprehensive comparison of deconvolution algorithms across both omic types, including reference-based and -free approaches. Our evaluation covers raw performance, stability, and computational efficiency under varying conditions, such as missing or additional cell types and diverse sample compositions. We design a reproducible workflow using containerization and publicly available code to ensure transparency and re-usability. Our results highlight the strengths and limitations of various algorithms, providing practical guidance for selecting the best method based on data type and context. This benchmark sets a new standard for evaluating deconvolution methods and analyzing tumor heterogeneity.

2025-12-30 | GSE281305 | GEO

Generation of multi-omic datasets using high-throughput molecular profiling of transcriptomic human data

2025-12-30 | GSE281204 | GEO

CITEgeist: Cellular Indexing of Transcriptomes and Epitopes for Guided Exploration of Intrinsic Spatial Trends

Project description:Spatial transcriptomics provides insights into tissue architecture by linking gene expression with spatial localization. Current deconvolution methods rely heavily on single-cell RNA sequencing (scRNA-seq) references, which are costly and often unavailable, especially if the tissue under evaluation is limited as in a core biopsy specimen. We present a novel tool, CITEgeist, that deconvolutes spatial transcriptomics data using antibody capture from the same slide as the reference, directly leveraging cell surface protein measurements from the same tissue section. This approach circumvents the limitations of scRNA-seq as a reference, offering a cost-effective and biologically grounded alternative. Our method employs mathematical optimization to estimate cell type proportions and gene expression profiles, incorporating sparsity constraints for robustness and interpretability. Benchmarks against state-of-the-art deconvolution methods show improved accuracy in cell type resolution, particularly in dense tumor microenvironments, while maintaining computational efficiency. This antibody-based tool advances spatial transcriptomics by providing a scalable, accurate, and reference-independent solution for deconvolution in complex tissues. We validate this tool by using a combined approach of simulated data and clinical samples by applying CITEgeist to translational pre-treatment and post-treatment ER+ breast tumors from an ongoing clinical trial, emphasizing the applicability and robustness of CITEgeist.

2026-03-19 | GSE289326 | GEO

Spectral Cruncher: A Visualization Tool Integrating Manual Curation, Ion-Intensity Prediction, and De Novo Tag Generation

Project description:Here's a suggested Project Description for your PRIDE submission: Project Description This dataset contains mass spectrometry raw files used as training data for SpecFormer, a transformer-based ion intensity prediction model integrated within PatternLab for Proteomics (Spectral Cruncher module). The dataset includes bulk proteomics data from HeLa cells and Mus musculus (C57BL/6) kidney tissue, as well as single-cell proteomics data from WT83 human brain organoids acquired using the cellenOne platform. All samples were analyzed on an Orbitrap Astral mass spectrometer using data-dependent acquisition (DDA). These raw files were used to train instrument-specific models for accurate fragment ion intensity prediction in both bulk and single-cell proteomics workflows. The SpecFormer model and analysis tools described in this dataset are freely available within PatternLab 5.1 at http://patternlabforproteomics.org/51.

2026-02-16 | PXD069898 | Pride

Insilico TOF top-down proteomics datasets generated using FTMS simulator (Spectroswiss)

Project description:This dataset consists of in silico generated TOF top-down proteomics spectra created using the FTMS simulator software (Spectroswiss). The simulated datasets are designed to evaluate FDR estimation in spectral deconvolution. Protein sequences were used to generate MS datasets with varying resolution, noise, and charge characteristics. The dataset includes deconvolved TSV files and corresponding input mzML insilico file.

2026-03-09 | PXD063631 | Pride

OmicsDI is part of the ELIXIR infrastructure

OmicsDI is an Elixir interoperability service. Learn more ›

Tweets

OmicsDI Databases

PRIDE
PeptideAtlas
MassIVE
JPOST Repository
Physiome Model Repository

EGA
EVA
ENA
LINCS
PAXDB
Cell Collective

MetaboLights
Metabolomics Workbench
MetabolomeExpress
GNPS
BioModels
FAIRDOMHub

ArrayExpress
dbGaP
ExpressionAtlas
GEO
NODE

Information

Databases
Help
API
Contact us
Code on GitHub
Terms of use
Submit Data