Dataset Information

Efficient cross-validation traversals in feature subset selection.

ABSTRACT: Sparse and robust classification models have the potential for revealing common predictive patterns that not only allow for categorizing objects into classes but also for generating mechanistic hypotheses. Identifying a small and informative subset of features is their main ingredient. However, the exponential search space of feature subsets and the heuristic nature of selection algorithms limit the coverage of these analyses, even for low-dimensional datasets. We present methods for reducing the computational complexity of feature selection criteria allowing for higher efficiency and coverage of screenings. We achieve this by reducing the preparation costs of high-dimensional subsets [Formula: see text] to those of one-dimensional ones [Formula: see text]. Our methods are based on a tight interaction between a parallelizable cross-validation traversal strategy and distance-based classification algorithms and can be used with any product distance or kernel. We evaluate the traversal strategy exemplarily in exhaustive feature subset selection experiments (perfect coverage). Its runtime, fitness landscape, and predictive performance are analyzed on publicly available datasets. Even in low-dimensional settings, we achieve approximately a 15-fold increase in exhaustively generating distance matrices for feature combinations bringing a new level of evaluations into reach.

SUBMITTER: Lausser L

PROVIDER: S-EPMC9744898 | biostudies-literature | 2022 Dec

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Efficient cross-validation traversals in feature subset selection.

Lausser Ludwig L Szekely Robin R Schmid Florian F Maucher Markus M Kestler Hans A HA

Scientific reports 20221212 1

Sparse and robust classification models have the potential for revealing common predictive patterns that not only allow for categorizing objects into classes but also for generating mechanistic hypotheses. Identifying a small and informative subset of features is their main ingredient. However, the exponential search space of feature subsets and the heuristic nature of selection algorithms limit the coverage of these analyses, even for low-dimensional datasets. We present methods for reducing th ...[more]

PMID: 36509882

Dataset Information

Efficient cross-validation traversals in feature subset selection.

Publications

Efficient cross-validation traversals in feature subset selection.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Fizzy: feature subset selection for metagenomics.
| S-EPMC4634798 | biostudies-literature

Feature Subset Selection for Cancer Classification Using Weight Local Modularity.
| S-EPMC5050509 | biostudies-literature

New feature subset selection procedures for classification of expression profiles.
| S-EPMC115205 | biostudies-literature

Enhancing data pipelines for forecasting student performance: integrating feature selection with cross-validation.
| S-EPMC8591701 | biostudies-literature

Measuring the bias of incorrect application of feature selection when using cross-validation in radiomics.
| S-EPMC8613324 | biostudies-literature

A Wrapper Feature Subset Selection Method Based on Randomized Search and Multilayer Structure.
| S-EPMC6885241 | biostudies-literature

Characterizing efficient feature selection for single-cell expression analysis
| S-EPMC11229035 | biostudies-literature

Beta Distribution-Based Cross-Entropy for Feature Selection.
| S-EPMC7515297 | biostudies-literature

TCellR2Vec: efficient feature selection for TCR sequences for cancer classification.
| S-EPMC11622898 | biostudies-literature

An Efficient Feature Selection Algorithm for Gene Families Using NMF and ReliefF.
| S-EPMC9957060 | biostudies-literature