Automatic sleep spindle detection: benchmarking with fine temporal resolution using open science tools.
ABSTRACT: Sleep spindle properties index cognitive faculties such as memory consolidation and diseases such as major depression. For this reason, scoring sleep spindle properties in polysomnographic recordings has become an important activity in both research and clinical settings. The tediousness of this manual task has motivated efforts for its automation. Although some progress has been made, increasing the temporal accuracy of spindle scoring and improving the performance assessment methodology are two aspects needing more attention. In this paper, four open-access automated spindle detectors with fine temporal resolution are proposed and tested against expert scoring of two proprietary and two open-access databases. Results highlight several findings: (1) that expert scoring and polysomnographic databases are important confounders when comparing the performance of spindle detectors tested using different databases or scorings; (2) because spindles are sparse events, specificity estimates are potentially misleading for assessing automated detector performance; (3) reporting the performance of spindle detectors exclusively with sensitivity and specificity estimates, as is often seen in the literature, is insufficient; including sensitivity, precision and a more comprehensive statistic such as Matthew's correlation coefficient, F1-score, or Cohen's ? is necessary for adequate evaluation; (4) reporting statistics for some reasonable range of decision thresholds provides a much more complete and useful benchmarking; (5) performance differences between tested automated detectors were found to be similar to those between available expert scorings; (6) much more development is needed to effectively compare the performance of spindle detectors developed by different research teams. Finally, this work clarifies a long-standing but only seldomly posed question regarding whether expert scoring truly is a reliable gold standard for sleep spindle assessment.
Project description:Sleep spindles are discrete, intermittent patterns of brain activity observed in human electroencephalographic data. Increasingly, these oscillations are of biological and clinical interest because of their role in development, learning and neurological disorders. We used an Internet interface to crowdsource spindle identification by human experts and non-experts, and we compared their performance with that of automated detection algorithms in data from middle- to older-aged subjects from the general population. We also refined methods for forming group consensus and evaluating the performance of event detectors in physiological data such as electroencephalographic recordings from polysomnography. Compared to the expert group consensus gold standard, the highest performance was by individual experts and the non-expert group consensus, followed by automated spindle detectors. This analysis showed that crowdsourcing the scoring of sleep data is an efficient method to collect large data sets, even for difficult tasks such as spindle identification. Further refinements to spindle detection algorithms are needed for middle- to older-aged subjects.
Project description:To measure the inter-expert and intra-expert agreement in sleep spindle scoring, and to quantify how many experts are needed to build a reliable dataset of sleep spindle scorings.The EEG dataset was comprised of 400 randomly selected 115s segments of stage 2 sleep from 110 sleeping subjects in the general population (57±8, range: 42-72 years). To assess expert agreement, a total of 24 Registered Polysomnographic Technologists (RPSGTs) scored spindles in a subset of the EEG dataset at a single electrode location (C3-M2). Intra-expert and inter-expert agreements were calculated as F1-scores, Cohen's kappa (?), and intra-class correlation coefficient (ICC).We found an average intra-expert F1-score agreement of 72±7% (?: 0.66±0.07). The average inter-expert agreement was 61±6% (?: 0.52±0.07). Amplitude and frequency of discrete spindles were calculated with higher reliability than the estimation of spindle duration. Reliability of sleep spindle scoring can be improved by using qualitative confidence scores, rather than a dichotomous yes/no scoring system.We estimate that 2-3 experts are needed to build a spindle scoring dataset with 'substantial' reliability (?: 0.61-0.8), and 4 or more experts are needed to build a dataset with 'almost perfect' reliability (?: 0.81-1).Spindle scoring is a critical part of sleep staging, and spindles are believed to play an important role in development, aging, and diseases of the nervous system.
Project description:Home single-channel nasal pressure (HNP) may be an alternative to polysomnography (PSG) for obstructive sleep apnea (OSA) diagnosis, but no cost studies have yet been carried out. Automatic scoring is simpler but generally less effective than manual scoring.To determine the diagnostic efficacy and cost of both scorings (automatic and manual) compared with PSG, taking as a polysomnographic OSA diagnosis several apnea-hypopnea index (AHI) cutoff points.We included suspected OSA patients in a multicenter study. They were randomized to home and hospital protocols. We constructed receiver operating characteristic (ROC) curves for both scorings. Diagnostic efficacy was explored for several HNP AHI cutoff points, and costs were calculated for equally effective alternatives.Of 787 randomized patients, 752 underwent HNP. Manual scoring produced better ROC curves than automatic for AHI < 15; similar curves were obtained for AHI ? 15. A valid HNP with manual scoring would determine the presence of OSA (or otherwise) in 90% of patients with a polysomnographic AHI ? 5 cutoff point, in 74% of patients with a polysomnographic AHI ? 10 cutoff point, and in 61% of patients with a polysomnographic AHI ? 15 cutoff point. In the same way, a valid HNP with automatic scoring would determine the presence of OSA (or otherwise) in 73% of patients with a polysomnographic AHI ? 5 cutoff point, in 64% of patients with a polysomnographic AHI ? 10 cutoff point, and in 57% of patients with a polysomnographic AHI ? 15 cutoff point. The costs of either HNP approaches were 40% to 70% lower than those of PSG at the same level of diagnostic efficacy. Manual HNP had the lowest cost for low polysomnographic AHI levels (? 5 and ? 10), and manual and automatic scorings had similar costs for higher polysomnographic cutoff points (AHI ? 15) of diagnosis.Home single-channel nasal pressure (HNP) is a cheaper alternative than polysomnography for obstructive sleep apnea diagnosis. HNP with manual scoring seems to have better diagnostic accuracy and a lower cost than automatic scoring for patients with low apnea-hypopnea index (AHI) levels, although automatic scoring has similar diagnostic accuracy and cost as manual scoring for intermediate and high AHI levels. Therefore, automatic scoring can be appropriately used, although diagnostic efficacy could improve if we carried out manual scoring on patients with AHI < 15.Clinicaltria