Unknown

Dataset Information

0

The impacts of active and self-supervised learning on efficient annotation of single-cell expression data.


ABSTRACT: A crucial step in the analysis of single-cell data is annotating cells to cell types and states. While a myriad of approaches has been proposed, manual labeling of cells to create training datasets remains tedious and time-consuming. In the field of machine learning, active and self-supervised learning methods have been proposed to improve the performance of a classifier while reducing both annotation time and label budget. However, the benefits of such strategies for single-cell annotation have yet to be evaluated in realistic settings. Here, we perform a comprehensive benchmarking of active and self-supervised labeling strategies across a range of single-cell technologies and cell type annotation algorithms. We quantify the benefits of active learning and self-supervised strategies in the presence of cell type imbalance and variable similarity. We introduce adaptive reweighting, a heuristic procedure tailored to single-cell data-including a marker-aware version-that shows competitive performance with existing approaches. In addition, we demonstrate that having prior knowledge of cell type markers improves annotation accuracy. Finally, we summarize our findings into a set of recommendations for those implementing cell type annotation procedures or platforms. An R package implementing the heuristic approaches introduced in this work may be found at https://github.com/camlab-bioml/leader .

SUBMITTER: Geuenich MJ 

PROVIDER: S-EPMC10837127 | biostudies-literature | 2024 Feb

REPOSITORIES: biostudies-literature

altmetric image

Publications

The impacts of active and self-supervised learning on efficient annotation of single-cell expression data.

Geuenich Michael J MJ   Gong Dae-Won DW   Campbell Kieran R KR  

Nature communications 20240203 1


A crucial step in the analysis of single-cell data is annotating cells to cell types and states. While a myriad of approaches has been proposed, manual labeling of cells to create training datasets remains tedious and time-consuming. In the field of machine learning, active and self-supervised learning methods have been proposed to improve the performance of a classifier while reducing both annotation time and label budget. However, the benefits of such strategies for single-cell annotation have  ...[more]

Similar Datasets

| S-EPMC7397036 | biostudies-literature
| S-EPMC10002629 | biostudies-literature
| S-EPMC8529514 | biostudies-literature
| S-EPMC10495322 | biostudies-literature
| S-EPMC9487595 | biostudies-literature
| S-EPMC10493897 | biostudies-literature
| S-EPMC11227494 | biostudies-literature
| S-EPMC7514320 | biostudies-literature
| S-EPMC11568879 | biostudies-literature
| S-EPMC10440826 | biostudies-literature