Unknown

Dataset Information

0

Joint variational autoencoders for multimodal imputation and embedding.


ABSTRACT: Single-cell multimodal datasets have measured various characteristics of individual cells, enabling a deep understanding of cellular and molecular mechanisms. However, multimodal data generation remains costly and challenging, and missing modalities happen frequently. Recently, machine learning approaches have been developed for data imputation but typically require fully matched multimodalities to learn common latent embeddings that potentially lack modality specificity. To address these issues, we developed an open-source machine learning model, Joint Variational Autoencoders for multimodal Imputation and Embedding (JAMIE). JAMIE takes single-cell multimodal data that can have partially matched samples across modalities. Variational autoencoders learn the latent embeddings of each modality. Then, embeddings from matched samples across modalities are aggregated to identify joint cross-modal latent embeddings before reconstruction. To perform cross-modal imputation, the latent embeddings of one modality can be used with the decoder of the other modality. For interpretability, Shapley values are used to prioritize input features for cross-modal imputation and known sample labels. We applied JAMIE to both simulation data and emerging single-cell multimodal data including gene expression, chromatin accessibility, and electrophysiology in human and mouse brains. JAMIE significantly outperforms existing state-of-the-art methods in general and prioritized multimodal features for imputation, providing potentially novel mechanistic insights at cellular resolution.

SUBMITTER: Kalafut NC 

PROVIDER: S-EPMC11340721 | biostudies-literature | 2023 Jun

REPOSITORIES: biostudies-literature

altmetric image

Publications

Joint variational autoencoders for multimodal imputation and embedding.

Kalafut Noah Cohen NC   Huang Xiang X   Wang Daifeng D  

Nature machine intelligence 20230529 6


Single-cell multimodal datasets have measured various characteristics of individual cells, enabling a deep understanding of cellular and molecular mechanisms. However, multimodal data generation remains costly and challenging, and missing modalities happen frequently. Recently, machine learning approaches have been developed for data imputation but typically require fully matched multimodalities to learn common latent embeddings that potentially lack modality specificity. To address these issues  ...[more]

Similar Datasets

| S-EPMC8605902 | biostudies-literature
| S-EPMC10802661 | biostudies-literature
| S-EPMC10590447 | biostudies-literature
| S-EPMC10553230 | biostudies-literature
| S-EPMC7946179 | biostudies-literature
| S-EPMC9246987 | biostudies-literature
| S-EPMC9813669 | biostudies-literature
| S-EPMC10148837 | biostudies-literature
| S-EPMC9710582 | biostudies-literature
| S-EPMC11210279 | biostudies-literature