Unknown

Dataset Information

0

Efficient cross-validation traversals in feature subset selection.


ABSTRACT: Sparse and robust classification models have the potential for revealing common predictive patterns that not only allow for categorizing objects into classes but also for generating mechanistic hypotheses. Identifying a small and informative subset of features is their main ingredient. However, the exponential search space of feature subsets and the heuristic nature of selection algorithms limit the coverage of these analyses, even for low-dimensional datasets. We present methods for reducing the computational complexity of feature selection criteria allowing for higher efficiency and coverage of screenings. We achieve this by reducing the preparation costs of high-dimensional subsets [Formula: see text] to those of one-dimensional ones [Formula: see text]. Our methods are based on a tight interaction between a parallelizable cross-validation traversal strategy and distance-based classification algorithms and can be used with any product distance or kernel. We evaluate the traversal strategy exemplarily in exhaustive feature subset selection experiments (perfect coverage). Its runtime, fitness landscape, and predictive performance are analyzed on publicly available datasets. Even in low-dimensional settings, we achieve approximately a 15-fold increase in exhaustively generating distance matrices for feature combinations bringing a new level of evaluations into reach.

SUBMITTER: Lausser L 

PROVIDER: S-EPMC9744898 | biostudies-literature | 2022 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

Efficient cross-validation traversals in feature subset selection.

Lausser Ludwig L   Szekely Robin R   Schmid Florian F   Maucher Markus M   Kestler Hans A HA  

Scientific reports 20221212 1


Sparse and robust classification models have the potential for revealing common predictive patterns that not only allow for categorizing objects into classes but also for generating mechanistic hypotheses. Identifying a small and informative subset of features is their main ingredient. However, the exponential search space of feature subsets and the heuristic nature of selection algorithms limit the coverage of these analyses, even for low-dimensional datasets. We present methods for reducing th  ...[more]

Similar Datasets

| S-EPMC4634798 | biostudies-literature
| S-EPMC8591701 | biostudies-literature
| S-EPMC5050509 | biostudies-literature
| S-EPMC115205 | biostudies-literature
| S-EPMC8613324 | biostudies-literature
| S-EPMC6101392 | biostudies-literature
| S-EPMC4054616 | biostudies-other
| S-EPMC6885241 | biostudies-literature
| S-EPMC11229035 | biostudies-literature
| S-EPMC7515297 | biostudies-literature