Browse
Submit Data
Databases
API
Help

Dataset Information

0 Views

0 Connections

0 Citations

0 Reanalyses

0 Downloads

Omics score: 0

A comprehensive benchmark of single-cell Hi-C embedding tools

ABSTRACT: Embedding is the key step in single-cell Hi-C (scHi-C) analysis which relies on capturing biological meaningful heterogeneity at various levels of genome architecture. To understand the strength and limitations of existing tools in various applications, here we use ten scHi-C datasets to benchmark thirteen embedding tools including Va3DE, a new convolutional neural network model that can accommodate large cell numbers. We built a software framework to decouple the preprocessing options of existing tools and found that no single tool works best across all datasets under default settings. The difficulty levels and preferred resolutions are different between benchmark datasets, and the choice of data representation and preprocessing strongly impact the embedding performance. Embedding cells from early embryonic stages relies on long-range compartment-scale contacts, but resolving cell cycle phases and complex tissue requires short-range loop-scale contacts. Both random-walk and inverse document frequency (IDF) transformation prefers long-range “compartment-scale” over short-range “loop-scale” embedding, while deep-learning methods better overcome sparsity at both scales and are more versatile with different resolutions. Finally, “diagonal integration” with independent data modal is a promising approach to distinguish similar cell subpopulations. Our findings underscore the significance of appropriate priors for scHi-C embedding and offer new insights into genome architecture heterogeneity.

ORGANISM(S): Mus musculus Homo sapiens

PROVIDER: GSE305523 | GEO | 2025/08/19

REPOSITORIES: GEO

ACCESS DATA

Json Xml

Dataset's files

Source:

			Action	DRS
		Other

Items per page:

1 - 1 of 1

Similar Datasets

SnapHiC-D: a computational pipeline to identify differential chromatin contacts from single cell Hi-C data

Project description:Single cell Hi-C (scHi-C) has been used to map genome organization in complex tissues. However, computational tools to detect dynamic chromatin contacts from scHi-C datasets in development and through disease pathogenesis are still lacking. Here, we present SnapHiC-D, a computational pipeline to identify differential chromatin contacts (DCCs) between two scHi-C datasets. Compared to methods designed for bulk Hi-C data, SnapHiC-D detects DCCs with high sensitivity and accuracy. We used SnapHiC-D to identify cell-type-specific chromatin contacts at 10 kilobase resolution in mouse hippocampal and human prefrontal cortical tissues, and demonstrated that DCCs detected in the cortical and hippocampal cell types are generally correlated with cell-type-specific gene expression patterns and epigenomic features. SnapHiC-D is freely available at https://github.com/HuMingLab/SnapHiC-D.

2023-08-24 | GSE210585 | GEO

scENCORE: leveraging single-cell epigenetic data to predict genome conformation using graph embedding

Project description:Recent advances in chromatin architecture profiling technologies, such as single-cell Hi-C (scHi-C), allow us to dissect the heterogeneity of chromosome higher-order structures across diverse cell states and different individuals. However, scHi-C experiments are still expensive and not immediately available for population-scale profiling. Here, we present scENCORE, a computational method, to reconstruct personalized and cell-type-specific higher-order chromatin structures, such as A/B compartments, directly from more cost-effective and widely available single-cell epigenetic data (e.g., scATAC-seq). We apply scENCORE on scATAC-seq data from post-mortem prefrontal cortex brains and demonstrate its utility to 1) project Mega-base scale chromatin regions into lower dimensional space by leveraging graph embedding technologies based on cell-type-specific co-variability patterns, 2) define A/B compartments via unsupervised clustering, 3) perform an alignment algorithm for multi-graph embedding to derive comparable chromatin representations and highlight dynamic chromatin compartments across cell states and individuals. Validated by Hi-C experiments using FACS-sorted cells, scENCORE can faithfully reconstruct cell-type-specific chromatin compartments. Furthermore, scENCORE uniformly constructs chromosome conformation across population-scale scATAC-seq data and discovers key 3D structural switching events associated with psychiatric disorders. In summary, scENCORE allows cost-effective cell-type-specific and personalized reconstruction that delineate higher-order chromatin structures.

2023-09-27 | GSE216270 | GEO

Metagenome benchmark

Project description:Benchmark of metagenome analysis tools

| PRJEB8286 | ENA

RNA splicing analysis using heterogeneous and large RNA-seq datasets

Project description:The ubiquity of RNA-seq has led to many methods that use RNA-seq data to analyze variations in RNA splicing. However, available methods are not well suited for handling heterogeneous and large datasets. Such datasets scale to thousands of samples across dozens of experimental conditions, exhibit increased variability compared to biological replicates, and involve thousands of unannotated splice variants resulting in increased transcriptome complexity. We describe here a suite of algorithms and tools implemented in the MAJIQ v2 package to address challenges in detection, quantification, and visualization of splicing variations from such datasets. Here we created a large, realistic synthetic RNA-seq dataset of 150 simulated cerebellum samples and 150 skeletal muscle samples using BEERS. We use this as a benchmark dataset to assess the advantages of MAJIQ v2 compared to existing methods.

2023-01-03 | GSE222044 | GEO

A single cell RNAseq benchmark experiment embedding "controlled" cancer heterogeneity

Project description:A single cell RNAseq benchmark experiment embedding "controlled" cancer heterogeneity

| PRJNA1019356 | ENA

Effective methods for bulk RNA-seq deconvolution using scnRNA-seq transcriptomes [cell mixtures bulk]

Project description:RNA profiling technologies at single-cell resolutions, including single-cell and single-nuclei RNA sequencing (scRNA-Seq and snRNA-Seq, scnRNA-Seq for short), can help characterize the composition of tissues and reveal cells that influence key healthy and disease functions. However, the use of these technologies is challenging because of their relatively high costs and exacting sample collection requirements. Computational deconvolution methods that infer the composition of RNA-Seq-profiled samples using scnRNA-Seq-characterized cell types can expand the benefit of these technologies, but their effectiveness remains controversial. We produced the first systematic evaluation of deconvolution methods on datasets with either known compositions or based on concurrent RNA-Seq and scnRNA-Seq profiles. Our analyses revealed biases that are common to scnRNA-Seq 10X Genomics assays and illustrated the importance of accurate and properly controlled data preprocessing and method selection and optimization. Moreover, our results suggested that concurrent RNA-Seq and scnRNA-Seq profiles can help improve the accuracy of both scnRNA-Seq preprocessing and the deconvolution methods that employ them. Indeed, our proposed method, Single-cell RNA Quantity Informed Deconvolution (SQUID), combined RNA-Seq transformation and a dampened weighted least squares deconvolution approach to consistently outperform other methods in predicting the composition of cell mixtures and tissue samples. Moreover, our analysis suggested that only SQUID could identify outcomes-predictive cancer cell subtypes in pediatric acute myeloid leukemia and neuroblastoma datasets.

2023-04-28 | GSE220605 | GEO

Effective methods for bulk RNA-seq deconvolution using scnRNA-seq transcriptomes [AML single-cell]

2023-04-28 | GSE220651 | GEO

Effective methods for bulk RNA-seq deconvolution using scnRNA-seq transcriptomes [cell mixtures scRNA-seq]

2023-04-28 | GSE220606 | GEO

Effective methods for bulk RNA-seq deconvolution using scnRNA-seq transcriptomes [AML Bulk]

2023-04-28 | GSE220607 | GEO

MaxLFQ label-free quantification algorithm benchmark datasets

Project description:We developed a set of algorithms for label-free quantification, termed MaxLFQ, embedded into MaxQuant. This contains two datasets to benchmark MaxLFQ: The proteome benchmark dataset consists of of HeLa and E. coli lysates mixed at defined ratios. The dynamic range benchmark dataset consists of UPS1/UPS2 standards (Sigma) spiked into E. coli lysates and quantified against each other.

2014-09-17 | PXD000279 | Pride

OmicsDI is part of the ELIXIR infrastructure

OmicsDI is an Elixir interoperability service. Learn more ›

Tweets

OmicsDI Databases

PRIDE
PeptideAtlas
MassIVE
JPOST Repository
Physiome Model Repository

EGA
EVA
ENA
LINCS
PAXDB
Cell Collective

MetaboLights
Metabolomics Workbench
MetabolomeExpress
GNPS
BioModels
FAIRDOMHub

ArrayExpress
dbGaP
ExpressionAtlas
GEO
NODE

Information

Databases
Help
API
Contact us
Code on GitHub
Terms of use
Submit Data