Unknown

Dataset Information

0

Joint inference of discrete cell types and continuous type-specific variability in single-cell datasets with MMIDAS.


ABSTRACT: Reproducible definition and identification of cell types is essential to enable investigations into their biological function, and understanding their relevance in the context of development, disease and evolution. Current approaches model variability in data as continuous latent factors, followed by clustering as a separate step, or immediately apply clustering on the data. Clusters obtained in this manner are considered as putative cell types in atlas-scale efforts such as those for mammalian brains. We show that such approaches can suffer from qualitative mistakes in identifying cell types robustly, particularly when the number of such cell types is in the hundreds or even thousands. Here, we propose an unsupervised method, MMIDAS (Mixture Model Inference with Discrete-coupled AutoencoderS), which combines a generalized mixture model with a multi-armed deep neural network, to jointly infer the discrete type and continuous type-specific variability. We develop this framework in a way that can be applied to analysis of both uni-modal and multi-modal datasets. Using four recent datasets of brain cells spanning different technologies, species, and conditions, we demonstrate that MMIDAS significantly outperforms state-of-the-art models in inferring interpretable discrete and continuous representations of cellular identity, and uncovers novel biological insights. Our unsupervised framework can thus help researchers identify more robust cell types, study cell type-dependent continuous variability, interpret such latent factors in the feature domain, and study multi-modal datasets.

SUBMITTER: Marghi Y 

PROVIDER: S-EPMC10592946 | biostudies-literature | 2023 Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications

Joint inference of discrete cell types and continuous type-specific variability in single-cell datasets with MMIDAS.

Marghi Yeganeh Y   Gala Rohan R   Baftizadeh Fahimeh F   Sümbül Uygar U  

bioRxiv : the preprint server for biology 20240702


Reproducible definition and identification of cell types is essential to enable investigations into their biological function, and understanding their relevance in the context of development, disease and evolution. Current approaches model variability in data as continuous latent factors, followed by clustering as a separate step, or immediately apply clustering on the data. We show that such approaches can suffer from qualitative mistakes in identifying cell types robustly, particularly when th  ...[more]

Similar Datasets

| S-EPMC10224950 | biostudies-literature
| S-EPMC4177667 | biostudies-literature
| S-EPMC8132955 | biostudies-literature
2023-04-14 | GSE208620 | GEO
| S-EPMC11879432 | biostudies-literature
| S-EPMC9714196 | biostudies-literature
2019-11-17 | GSE139412 | GEO
| S-EPMC9048671 | biostudies-literature
| S-EPMC6784335 | biostudies-literature
| S-EPMC10939367 | biostudies-literature