Dataset Information

Single-Cell Transcriptome Data Clustering via Multinomial Modeling and Adaptive Fuzzy K-Means Algorithm.

ABSTRACT: Single-cell RNA sequencing technologies have enabled us to study tissue heterogeneity at cellular resolution. Fast-developing sequencing platforms like droplet-based sequencing make it feasible to parallel process thousands of single cells effectively. Although a unique molecular identifier (UMI) can remove bias from amplification noise to a certain extent, clustering for such sparse and high-dimensional large-scale discrete data remains intractable and challenging. Most existing deep learning-based clustering methods utilize the mean square error or negative binomial distribution with or without zero inflation to denoise single-cell UMI count data, which may underfit or overfit the gene expression profiles. In addition, neglecting the molecule sampling mechanism and extracting representation by simple linear dimension reduction with a hard clustering algorithm may distort data structure and lead to spurious analytical results. In this paper, we combined the deep autoencoder technique with statistical modeling and developed a novel and effective clustering method, scDMFK, for single-cell transcriptome UMI count data. ScDMFK utilizes multinomial distribution to characterize data structure and draw support from neural network to facilitate model parameter estimation. In the learned low-dimensional latent space, we proposed an adaptive fuzzy k-means algorithm with entropy regularization to perform soft clustering. Various simulation scenarios and the analysis of 10 real datasets have shown that scDMFK outperforms other state-of-the-art methods with respect to data modeling and clustering algorithms. Besides, scDMFK has excellent scalability for large-scale single-cell datasets.

SUBMITTER: Chen L

PROVIDER: S-EPMC7180207 | biostudies-literature | 2020

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Single-Cell Transcriptome Data Clustering via Multinomial Modeling and Adaptive Fuzzy K-Means Algorithm.

Chen Liang L Wang Weinan W Zhai Yuyao Y Deng Minghua M

Frontiers in genetics 20200417

Single-cell RNA sequencing technologies have enabled us to study tissue heterogeneity at cellular resolution. Fast-developing sequencing platforms like droplet-based sequencing make it feasible to parallel process thousands of single cells effectively. Although a unique molecular identifier (UMI) can remove bias from amplification noise to a certain extent, clustering for such sparse and high-dimensional large-scale discrete data remains intractable and challenging. Most existing deep learning-b ...[more]

PMID: 32362908

Dataset Information

Single-Cell Transcriptome Data Clustering via Multinomial Modeling and Adaptive Fuzzy K-Means Algorithm.

Publications

Single-Cell Transcriptome Data Clustering via Multinomial Modeling and Adaptive Fuzzy K-Means Algorithm.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Differential privacy fuzzy C-means clustering algorithm based on gaussian kernel function.
| S-EPMC7987176 | biostudies-literature

Fuzzy c-means clustering with prior biological knowledge.
| S-EPMC2673503 | biostudies-literature

Unsupervised Cryo-EM Data Clustering through Adaptively Constrained K-Means Algorithm.
| S-EPMC5154524 | biostudies-literature

A novel harmony search-K means hybrid algorithm for clustering gene expression data.
| S-EPMC3563403 | biostudies-literature

Hesitant Fuzzy Entropy-Based Opportunistic Clustering and Data Fusion Algorithm for Heterogeneous Wireless Sensor Networks.
| S-EPMC7038969 | biostudies-literature

An adaptive map-matching algorithm based on hierarchical fuzzy system from vehicular GPS data.
| S-EPMC5716534 | biostudies-literature

Marker-controlled watershed algorithm and fuzzy C-means clustering machine learning: automated segmentation of glioblastoma from MRI images in a case series
| S-EPMC10923355 | biostudies-literature

mbkmeans: Fast clustering for single cell data using mini-batch k-means.
| S-EPMC7864438 | biostudies-literature

A wavelet relational fuzzy C-means algorithm for 2D gel image segmentation.
| S-EPMC3794507 | biostudies-literature

Clustering microbiome data using mixtures of logistic normal multinomial models.
| S-EPMC10484970 | biostudies-literature