Unknown

Dataset Information

0

A cross entropy test allows quantitative statistical comparison of t-SNE and UMAP representations.


ABSTRACT: The advent of high-dimensional single-cell data has necessitated the development of dimensionality-reduction tools. t-Distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP) are the two most frequently used approaches, allowing clear visualization of complex single-cell datasets. Despite the need for quantitative comparison, t-SNE and UMAP have largely remained visualization tools due to the lack of robust statistical approaches. Here, we have derived a statistical test for evaluating the difference between dimensionality-reduced datasets using the Kolmogorov-Smirnov test on the distributions of cross entropy of single cells within each dataset. As the approach uses the inter-relationship of single cells for comparison, the resulting statistic is robust and capable of identifying true biological variation. Further, the test provides a valid distance between single-cell datasets, allowing the organization of multiple samples into a dendrogram for quantitative comparison of complex datasets. These results demonstrate the largely untapped potential of dimensionality-reduction tools for biomedical data analysis beyond visualization.

SUBMITTER: Roca CP 

PROVIDER: S-EPMC9939422 | biostudies-literature | 2023 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

A cross entropy test allows quantitative statistical comparison of t-SNE and UMAP representations.

Roca Carlos P CP   Burton Oliver T OT   Neumann Julika J   Tareen Samar S   Whyte Carly E CE   Gergelits Vaclav V   Veiga Rafael V RV   Humblet-Baron Stéphanie S   Liston Adrian A  

Cell reports methods 20230113 1


The advent of high-dimensional single-cell data has necessitated the development of dimensionality-reduction tools. t-Distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP) are the two most frequently used approaches, allowing clear visualization of complex single-cell datasets. Despite the need for quantitative comparison, t-SNE and UMAP have largely remained visualization tools due to the lack of robust statistical approaches. Here, we have d  ...[more]

Similar Datasets

| S-EPMC10897166 | biostudies-literature
| S-EPMC8091681 | biostudies-literature
| S-EPMC6560168 | biostudies-literature
2011-08-10 | GSE26736 | GEO
2011-08-10 | GSE26732 | GEO
2011-08-10 | GSE26735 | GEO
| S-EPMC2916828 | biostudies-literature
2011-08-09 | E-GEOD-26736 | biostudies-arrayexpress
2011-08-09 | E-GEOD-26735 | biostudies-arrayexpress
2011-08-09 | E-GEOD-26732 | biostudies-arrayexpress