Unknown

Dataset Information

0

Semi-CAM: A semi-supervised deconvolution method for bulk transcriptomic data with partial marker gene information.


ABSTRACT: Deconvolution of bulk transcriptomics data from mixed cell populations is vital to identify the cellular mechanism of complex diseases. Existing deconvolution approaches can be divided into two major groups: supervised and unsupervised methods. Supervised deconvolution methods use cell type-specific prior information including cell proportions, reference cell type-specific gene signatures, or marker genes for each cell type, which may not be available in practice. Unsupervised methods, such as non-negative matrix factorization (NMF) and Convex Analysis of Mixtures (CAM), in contrast, completely disregard prior information and thus are not efficient for data with partial cell type-specific information. In this paper, we propose a semi-supervised deconvolution method, semi-CAM, that extends CAM by utilizing marker information from partial cell types. Analysis of simulation and two benchmark data have demonstrated that semi-CAM outperforms CAM by yielding more accurate cell proportion estimations when markers from partial/all cell types are available. In addition, when markers from all cell types are available, semi-CAM achieves better or similar accuracy compared to the supervised method using signature genes, CIBERSORT, and the marker-based supervised methods semi-NMF and DSA. Furthermore, analysis of human chlamydia-infection data with bulk expression profiles from six cell types and prior marker information of only three cell types suggests that semi-CAM achieves more accurate cell proportion estimations than CAM.

SUBMITTER: Dong L 

PROVIDER: S-EPMC7096458 | biostudies-literature | 2020 Mar

REPOSITORIES: biostudies-literature

altmetric image

Publications

Semi-CAM: A semi-supervised deconvolution method for bulk transcriptomic data with partial marker gene information.

Dong Li L   Kollipara Avinash A   Darville Toni T   Zou Fei F   Zheng Xiaojing X  

Scientific reports 20200325 1


Deconvolution of bulk transcriptomics data from mixed cell populations is vital to identify the cellular mechanism of complex diseases. Existing deconvolution approaches can be divided into two major groups: supervised and unsupervised methods. Supervised deconvolution methods use cell type-specific prior information including cell proportions, reference cell type-specific gene signatures, or marker genes for each cell type, which may not be available in practice. Unsupervised methods, such as n  ...[more]

Similar Datasets

| S-EPMC6678337 | biostudies-literature
2022-09-03 | GSE168264 | GEO
2019-11-13 | GSE140262 | GEO
| S-EPMC7856146 | biostudies-literature
| S-EPMC2266799 | biostudies-literature
| S-EPMC3646965 | biostudies-literature
| S-EPMC6787340 | biostudies-literature