Design of a Deep Learning Model for Automatic Scoring of Periodic and Non-Periodic Leg Movements during Sleep Validated against Multiple Human Experts
Objective: Currently, manual scoring is the gold standard of leg movement scoring (LMs) and periodic LMs (PLMS) in overnight polysomnography (PSG) studies, which is subject to inter-scorer variability. The objective of this study is to design and validate an end-to-end deep learning system for the automatic scoring of LMs and PLMS in sleep.
Methods: The deep learning system was developed, validated and tested, with respect to manual annotations by expert technicians on 800 overnight PSGs using a leg electromyography channel. The study includes data from three cohorts, namely, the Wisconsin Sleep Cohort (WSC), Stanford Sleep Cohort (SSC) and MrOS Sleep Study. The performance of the system was further compared against individual expert technicians and existing PLM detectors.
Results: The system achieved an F1 score of 0.83, 0.71, and 0.77 for the WSC, SSC, and an ancillary study (Osteoporotic Fractures in Men Study, MrOS) cohorts, respectively. In a total of 60 PSGs from the WSC and the SSC scored by nine expert technicians, the system performed better than two and comparable to seven of the individual scorers with respect to a majority-voting consensus of the remaining scorers. In 60 PSGs from the WSC scored accurately for PLMS, the system outperformed four previous PLM detectors, which were all evaluated on the same data, with an F1 score of 0.85.
Conclusions: The proposed system performs better or comparable to individual expert technicians while outperforming previous automatic detectors. Thereby, the study validates fully automatic methods for scoring LMs in sleep.