Dataset Information

Semi-supervised consensus clustering for gene expression data analysis.

ABSTRACT:

Background

Simple clustering methods such as hierarchical clustering and k-means are widely used for gene expression data analysis; but they are unable to deal with noise and high dimensionality associated with the microarray gene expression data. Consensus clustering appears to improve the robustness and quality of clustering results. Incorporating prior knowledge in clustering process (semi-supervised clustering) has been shown to improve the consistency between the data partitioning and domain knowledge.

Methods

We proposed semi-supervised consensus clustering (SSCC) to integrate the consensus clustering with semi-supervised clustering for analyzing gene expression data. We investigated the roles of consensus clustering and prior knowledge in improving the quality of clustering. SSCC was compared with one semi-supervised clustering algorithm, one consensus clustering algorithm, and k-means. Experiments on eight gene expression datasets were performed using h-fold cross-validation.

Results

Using prior knowledge improved the clustering quality by reducing the impact of noise and high dimensionality in microarray data. Integration of consensus clustering with semi-supervised clustering improved performance as compared to using consensus clustering or semi-supervised clustering separately. Our SSCC method outperformed the others tested in this paper.

SUBMITTER: Wang Y

PROVIDER: S-EPMC4036113 | biostudies-literature | 2014

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Semi-supervised consensus clustering for gene expression data analysis.

Wang Yunli Y Pan Youlian Y

BioData mining 20140508

<h4>Background</h4>Simple clustering methods such as hierarchical clustering and k-means are widely used for gene expression data analysis; but they are unable to deal with noise and high dimensionality associated with the microarray gene expression data. Consensus clustering appears to improve the robustness and quality of clustering results. Incorporating prior knowledge in clustering process (semi-supervised clustering) has been shown to improve the consistency between the data partitioning a ...[more]

PMID: 24920961

Dataset Information

Semi-supervised consensus clustering for gene expression data analysis.

Background

Methods

Results

Publications

Semi-supervised consensus clustering for gene expression data analysis.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Non-supervised hierarchical clustering of gene expression data
2008-08-30 | GSE12627 | GEO

Semi-supervised methods to predict patient survival from gene expression data.
| S-EPMC387275 | biostudies-literature

Semi-supervised oblique predictive clustering trees.
| S-EPMC8101547 | biostudies-literature

MSC-CSMC: A multi-objective semi-supervised clustering algorithm based on constraints selection and multi-source constraints for gene expression data.
| S-EPMC10008853 | biostudies-literature

scSemiAAE: a semi-supervised clustering model for single-cell RNA-seq data.
| S-EPMC10214737 | biostudies-literature

Semi-Supervised Fuzzy Clustering with Feature Discrimination.
| S-EPMC4556708 | biostudies-literature

Data-Driven Detection of Subclinical Keratoconus via Semi-Supervised Clustering of Multidimensional Corneal Biomarkers.
| S-EPMC12756640 | biostudies-literature

Semi-supervised network inference using simulated gene expression dynamics.
| S-EPMC6455938 | biostudies-literature

Subcellular proteome niche discovery using semi-supervised functional clustering.
| S-EPMC12709497 | biostudies-literature

Data-driven Derivation and Validation of Novel Phenotypes for Acute Kidney Transplant Rejection using Semi-supervised Clustering.
| S-EPMC8259675 | biostudies-literature