Unknown

Dataset Information

0

Clustering gene expression data with a penalized graph-based metric.


ABSTRACT: BACKGROUND:The search for cluster structure in microarray datasets is a base problem for the so-called "-omic sciences". A difficult problem in clustering is how to handle data with a manifold structure, i.e. data that is not shaped in the form of compact clouds of points, forming arbitrary shapes or paths embedded in a high-dimensional space, as could be the case of some gene expression datasets. RESULTS:In this work we introduce the Penalized k-Nearest-Neighbor-Graph (PKNNG) based metric, a new tool for evaluating distances in such cases. The new metric can be used in combination with most clustering algorithms. The PKNNG metric is based on a two-step procedure: first it constructs the k-Nearest-Neighbor-Graph of the dataset of interest using a low k-value and then it adds edges with a highly penalized weight for connecting the subgraphs produced by the first step. We discuss several possible schemes for connecting the different sub-graphs as well as penalization functions. We show clustering results on several public gene expression datasets and simulated artificial problems to evaluate the behavior of the new metric. CONCLUSIONS:In all cases the PKNNG metric shows promising clustering results. The use of the PKNNG metric can improve the performance of commonly used pairwise-distance based clustering methods, to the level of more advanced algorithms. A great advantage of the new procedure is that researchers do not need to learn a new method, they can simply compute distances with the PKNNG metric and then, for example, use hierarchical clustering to produce an accurate and highly interpretable dendrogram of their high-dimensional data.

SUBMITTER: Baya AE 

PROVIDER: S-EPMC3023695 | biostudies-literature | 2011 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

Clustering gene expression data with a penalized graph-based metric.

BayĆ” Ariel E AE   Granitto Pablo M PM  

BMC bioinformatics 20110104


<h4>Background</h4>The search for cluster structure in microarray datasets is a base problem for the so-called "-omic sciences". A difficult problem in clustering is how to handle data with a manifold structure, i.e. data that is not shaped in the form of compact clouds of points, forming arbitrary shapes or paths embedded in a high-dimensional space, as could be the case of some gene expression datasets.<h4>Results</h4>In this work we introduce the Penalized k-Nearest-Neighbor-Graph (PKNNG) bas  ...[more]

Similar Datasets

| S-EPMC3072547 | biostudies-literature
| S-EPMC9293048 | biostudies-literature
| S-EPMC6394398 | biostudies-literature
| S-EPMC7162352 | biostudies-literature
| S-EPMC3180043 | biostudies-literature
| S-EPMC2867492 | biostudies-literature
| S-EPMC6364860 | biostudies-literature
| S-EPMC7703756 | biostudies-literature
| S-EPMC2912890 | biostudies-literature
| S-EPMC3118357 | biostudies-literature