MotivationAccurately clustering cell types from a mass of heterogeneous cells is a crucial first step for the analysis of single-cell RNA-seq (scRNA-Seq) data. Although several methods have been recently developed, they utilize different characteristics of data and yield varying results in terms of both the number of clusters and actual cluster assignments.
ResultsHere, we present SAFE-clustering, single-cell aggregated (From Ensemble) clustering, a flexible, accurate and robust method for clustering scRNA-Seq data. SAFE-clustering takes as input, results from multiple clustering methods, to build one consensus solution. SAFE-clustering currently embeds four state-of-the-art methods, SC3, CIDR, Seurat and t-SNE + k-means; and ensembles solutions from these four methods using three hypergraph-based partitioning algorithms. Extensive assessment across 12 datasets with the number of clusters ranging from 3 to 14, and the number of single cells ranging from 49 to 32, 695 showcases the advantages of SAFE-clustering in terms of both cluster number (18.2-58.1% reduction in absolute deviation to the truth) and cluster assignment (on average 36.0% improvement, and up to 18.5% over the best of the four methods, measured by adjusted rand index). Moreover, SAFE-clustering is computationally efficient to accommodate large datasets, taking <10 min to process 28 733 cells.
Availability and implementationSAFEclustering, including source codes and tutorial, is freely available at https://github.com/yycunc/SAFEclustering.
Supplementary informationSupplementary data are available at Bioinformatics online.