Unknown

Dataset Information

0

A novel machine learning strategy for model selections - Stepwise Support Vector Machine (StepSVM).


ABSTRACT: An essential aspect of medical research is the prediction for a health outcome and the scientific identification of important factors. As a result, numerous methods were developed for model selections in recent years. In the era of big data, machine learning has been broadly adopted for data analysis. In particular, the Support Vector Machine (SVM) has an excellent performance in classifications and predictions with the high-dimensional data. In this research, a novel model selection strategy is carried out, named as the Stepwise Support Vector Machine (StepSVM). The new strategy is based on the SVM to conduct a modified stepwise selection, where the tuning parameter could be determined by 10-fold cross-validation that minimizes the mean squared error. Two popular methods, the conventional stepwise logistic regression model and the SVM Recursive Feature Elimination (SVM-RFE), were compared to the StepSVM. The Stability and accuracy of the three strategies were evaluated by simulation studies with a complex hierarchical structure. Up to five variables were selected to predict the dichotomous cancer remission of a lung cancer patient. Regarding the stepwise logistic regression, the mean of the C-statistic was 69.19%. The overall accuracy of the SVM-RFE was estimated at 70.62%. In contrast, the StepSVM provided the highest prediction accuracy of 80.57%. Although the StepSVM is more time consuming, it is more consistent and outperforms the other two methods.

SUBMITTER: Guo CY 

PROVIDER: S-EPMC7451646 | biostudies-literature | 2020

REPOSITORIES: biostudies-literature

altmetric image

Publications

A novel machine learning strategy for model selections - Stepwise Support Vector Machine (StepSVM).

Guo Chao-Yu CY   Chou Yu-Chin YC  

PloS one 20200827 8


An essential aspect of medical research is the prediction for a health outcome and the scientific identification of important factors. As a result, numerous methods were developed for model selections in recent years. In the era of big data, machine learning has been broadly adopted for data analysis. In particular, the Support Vector Machine (SVM) has an excellent performance in classifications and predictions with the high-dimensional data. In this research, a novel model selection strategy is  ...[more]

Similar Datasets

| S-EPMC3264588 | biostudies-other
| S-EPMC4909287 | biostudies-literature
2020-11-05 | PXD020398 | Pride
| S-EPMC4669521 | biostudies-literature
| S-EPMC3737136 | biostudies-literature
| S-EPMC4394448 | biostudies-literature
| S-EPMC6868120 | biostudies-literature