Dataset Information

Multi-scored sleep databases: how to exploit the multiple-labels in automated sleep scoring.

ABSTRACT:

Study objectives

Inter-scorer variability in scoring polysomnograms is a well-known problem. Most of the existing automated sleep scoring systems are trained using labels annotated by a single-scorer, whose subjective evaluation is transferred to the model. When annotations from two or more scorers are available, the scoring models are usually trained on the scorer consensus. The averaged scorer's subjectivity is transferred into the model, losing information about the internal variability among different scorers. In this study, we aim to insert the multiple-knowledge of the different physicians into the training procedure. The goal is to optimize a model training, exploiting the full information that can be extracted from the consensus of a group of scorers.

Methods

We train two lightweight deep learning-based models on three different multi-scored databases. We exploit the label smoothing technique together with a soft-consensus (LSSC) distribution to insert the multiple-knowledge in the training procedure of the model. We introduce the averaged cosine similarity metric (ACS) to quantify the similarity between the hypnodensity-graph generated by the models with-LSSC and the hypnodensity-graph generated by the scorer consensus.

Results

The performance of the models improves on all the databases when we train the models with our LSSC. We found an increase in ACS (up to 6.4%) between the hypnodensity-graph generated by the models trained with-LSSC and the hypnodensity-graph generated by the consensus.

Conclusion

Our approach definitely enables a model to better adapt to the consensus of the group of scorers. Future work will focus on further investigations on different scoring architectures and hopefully large-scale-heterogeneous multi-scored datasets.

SUBMITTER: Fiorillo L

PROVIDER: S-EPMC10171642 | biostudies-literature | 2023 May

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Multi-scored sleep databases: how to exploit the multiple-labels in automated sleep scoring.

Fiorillo Luigi L Pedroncelli Davide D Agostini Valentina V Favaro Paolo P Faraci Francesca Dalia FD

Sleep 20230501 5

<h4>Study objectives</h4>Inter-scorer variability in scoring polysomnograms is a well-known problem. Most of the existing automated sleep scoring systems are trained using labels annotated by a single-scorer, whose subjective evaluation is transferred to the model. When annotations from two or more scorers are available, the scoring models are usually trained on the scorer consensus. The averaged scorer's subjectivity is transferred into the model, losing information about the internal variabili ...[more]

PMID: 36762998

Similar Datasets

Project description:BackgroundThe advent of home sleep testing has allowed for the development of an ambulatory care model for OSA that most health-care providers can easily deploy. Although automated algorithms that accompany home sleep monitors can identify and classify disordered breathing events, it is unclear whether manual scoring followed by expert review of home sleep recordings is of any value. Thus, this study examined the agreement between automated and manual scoring of home sleep recordings.MethodsTwo type 3 monitors (ApneaLink Plus [ResMed] and Embletta [Embla Systems]) were examined in distinct study samples. Data from manual and automated scoring were available for 200 subjects. Two thresholds for oxygen desaturation (≥ 3% and ≥ 4%) were used to define disordered breathing events. Agreement between manual and automated scoring was examined using Pearson correlation coefficients and Bland-Altman analyses.ResultsAutomated scoring consistently underscored disordered breathing events compared with manual scoring for both sleep monitors irrespective of whether a ≥ 3% or ≥ 4% oxygen desaturation threshold was used to define the apnea-hypopnea index (AHI). For the ApneaLink Plus monitor, Bland-Altman analyses revealed an average AHI difference between manual and automated scoring of 6.1 (95% CI, 4.9-7.3) and 4.6 (95% CI, 3.5-5.6) events/h for the ≥ 3% and ≥ 4% oxygen desaturation thresholds, respectively. Similarly for the Embletta monitor, the average difference between manual and automated scoring was 5.3 (95% CI, 3.2-7.3) and 8.4 (95% CI, 7.2-9.6) events/h, respectively.ConclusionsAlthough agreement between automated and manual scoring of home sleep recordings varies based on the device used, modest agreement was observed between the two approaches. However, manual review of home sleep test recordings can decrease the misclassification of OSA severity, particularly for those with mild disease.Trial registryClinicalTrials.gov; No.: NCT01503164; www.clinicaltrials.gov.

Dataset Information

Multi-scored sleep databases: how to exploit the multiple-labels in automated sleep scoring.

Study objectives

Methods

Results

Conclusion

Publications

Multi-scored sleep databases: how to exploit the multiple-labels in automated sleep scoring.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets