Unknown

Dataset Information

0

Seriation Using Tree-penalized Path Length.


ABSTRACT: Given a sample of n data points and an n by n dissimilarity matrix, data seriation methods produce a linear ordering of the objects, putting similar objects nearby in the ordering. One may visualize the reordered dissimilarity matrix with a heat map and thus understand the structure of the data, while still displaying the full matrix of dissimilarities. Good orderings produce heat maps that are easy to read and allow for clear interpretation. We consider two popular seriation methods, minimizing path length by solving the Traveling Salesman Problem (TSP), and Optimal Leaf Ordering (OLO), which minimizes path length among all orderings consistent with a given tree structure. Learning from the strengths and weaknesses of the two methods, we introduce a new hybrid seriation method, tree-penalized Path Length (tpPL). The objective is a linear combination of path length and the extent of violations of the tree structure, with a parameter that transitions the optimal paths smoothly from TSP to OLO. We present a detailed study over 44 synthetic datasets which are designed to bring out the strengths and weaknesses of the three methods, finding that the hybrid nature of tpPL enables it to overcome the weaknesses of TSP and OLO.

SUBMITTER: Aliyev DA 

PROVIDER: S-EPMC9642984 | biostudies-literature | 2023 Mar

REPOSITORIES: biostudies-literature

altmetric image

Publications

Seriation Using Tree-penalized Path Length.

Aliyev Denis A DA   Zirbel Craig L CL  

European journal of operational research 20220617 2


Given a sample of <i>n</i> data points and an <i>n</i> by <i>n</i> dissimilarity matrix, data seriation methods produce a linear ordering of the objects, putting similar objects nearby in the ordering. One may visualize the reordered dissimilarity matrix with a heat map and thus understand the structure of the data, while still displaying the full matrix of dissimilarities. Good orderings produce heat maps that are easy to read and allow for clear interpretation. We consider two popular seriatio  ...[more]

Similar Datasets

| S-EPMC3081869 | biostudies-other
| S-EPMC4958949 | biostudies-literature
| S-EPMC3272679 | biostudies-literature
| S-EPMC3741681 | biostudies-literature
| S-EPMC5482830 | biostudies-literature
| S-EPMC2233645 | biostudies-literature
| S-EPMC2585631 | biostudies-other
| S-EPMC5771864 | biostudies-literature
| S-EPMC10653492 | biostudies-literature
| S-EPMC6766775 | biostudies-literature