Unknown

Dataset Information

0

Optimization of diabetes prediction methods based on combinatorial balancing algorithm.


ABSTRACT:

Background

Diabetes, as a significant disease affecting public health, requires early detection for effective management and intervention. However, imbalanced datasets pose a challenge to accurate diabetes prediction. This imbalance often results in models performing poorly in predicting minority classes, affecting overall diagnostic performance.

Objectives

To address this issue, this study employs a combination of Synthetic Minority Over-sampling Technique (SMOTE) and Random Under-Sampling (RUS) for data balancing and uses Optuna for hyperparameter optimization of machine learning models. This approach aims to fill the gap in current research concerning data balancing and model optimization, thereby improving prediction accuracy and computational efficiency.

Methods

First, the study uses SMOTE and RUS methods to process the imbalanced diabetes dataset, balancing the data distribution. Then, Optuna is utilized to optimize the hyperparameters of the LightGBM model to enhance its performance. During the experiment, the effectiveness of the proposed methods is evaluated by comparing the training results of the dataset before and after balancing.

Results

The experimental results show that the enhanced LightGBM-Optuna model improves the accuracy from 97.07% to 97.11%, and the precision from 97.17% to 98.99%. The time required for a single search is only 2.5 seconds. These results demonstrate the superiority of the proposed method in handling imbalanced datasets and optimizing model performance.

Conclusions

The study indicates that combining SMOTE and RUS data balancing algorithms with Optuna for hyperparameter optimization can effectively enhance machine learning models, especially in dealing with imbalanced datasets for diabetes prediction.

SUBMITTER: Shao H 

PROVIDER: S-EPMC11324958 | biostudies-literature | 2024 Aug

REPOSITORIES: biostudies-literature

altmetric image

Publications

Optimization of diabetes prediction methods based on combinatorial balancing algorithm.

Shao HuiZhi H   Liu Xiang X   Zong DaShuai D   Song QingJun Q  

Nutrition & diabetes 20240814 1


<h4>Background</h4>Diabetes, as a significant disease affecting public health, requires early detection for effective management and intervention. However, imbalanced datasets pose a challenge to accurate diabetes prediction. This imbalance often results in models performing poorly in predicting minority classes, affecting overall diagnostic performance.<h4>Objectives</h4>To address this issue, this study employs a combination of Synthetic Minority Over-sampling Technique (SMOTE) and Random Unde  ...[more]

Similar Datasets

| S-EPMC4151488 | biostudies-other
| S-EPMC8330920 | biostudies-literature
| S-EPMC5357838 | biostudies-literature
| S-EPMC10363121 | biostudies-literature
2012-05-09 | E-GEOD-37858 | biostudies-arrayexpress
| S-EPMC10967787 | biostudies-literature
2012-05-10 | GSE37858 | GEO
2021-04-23 | GSE173083 | GEO
2022-05-16 | GSE189510 | GEO
| S-EPMC8938491 | biostudies-literature