Unknown

Dataset Information

0

Privacy-Preserving Deep Speaker Separation for Smartphone-Based Passive Speech Assessment.


ABSTRACT: Goal: Smartphones can be used to passively assess and monitor patients' speech impairments caused by ailments such as Parkinson's disease, Traumatic Brain Injury (TBI), Post-Traumatic Stress Disorder (PTSD) and neurodegenerative diseases such as Alzheimer's disease and dementia. However, passive audio recordings in natural settings often capture the speech of non-target speakers (cross-talk). Consequently, speaker separation, which identifies the target speakers' speech in audio recordings with two or more speakers' voices, is a crucial pre-processing step in such scenarios. Prior speech separation methods analyzed raw audio. However, in order to preserve speaker privacy, passively recorded smartphone audio and machine learning-based speech assessment are often performed on derived speech features such as Mel-Frequency Cepstral Coefficients (MFCCs). In this paper, we propose a novel Deep MFCC bAsed SpeaKer Separation (Deep-MASKS). Methods: Deep-MASKS uses an autoencoder to reconstruct MFCC components of an individual's speech from an i-vector, x-vector or d-vector representation of their speech learned during the enrollment period. Deep-MASKS utilizes a Deep Neural Network (DNN) for MFCC signal reconstructions, which yields a more accurate, higher-order function compared to prior work that utilized a mask. Unlike prior work that operates on utterances, Deep-MASKS operates on continuous audio recordings. Results: Deep-MASKS outperforms baselines, reducing the Mean Squared Error (MSE) of MFCC reconstruction by up to 44% and the number of additional bits required to represent clean speech entropy by 36%.

SUBMITTER: Ditthapron A 

PROVIDER: S-EPMC8940203 | biostudies-literature | 2021

REPOSITORIES: biostudies-literature

altmetric image

Publications

Privacy-Preserving Deep Speaker Separation for Smartphone-Based Passive Speech Assessment.

Ditthapron Apiwat A   O Agu Emmanuel E   C Lammert Adam A  

IEEE open journal of engineering in medicine and biology 20210304


<i>Goal:</i> Smartphones can be used to passively assess and monitor patients' speech impairments caused by ailments such as Parkinson's disease, Traumatic Brain Injury (TBI), Post-Traumatic Stress Disorder (PTSD) and neurodegenerative diseases such as Alzheimer's disease and dementia. However, passive audio recordings in natural settings often capture the speech of non-target speakers (cross-talk). Consequently, speaker separation, which identifies the target speakers' speech in audio recording  ...[more]

Similar Datasets

| S-EPMC11424628 | biostudies-literature
| S-EPMC7144575 | biostudies-literature
| S-EPMC7654633 | biostudies-literature
| S-EPMC7041894 | biostudies-literature
| S-EPMC6095666 | biostudies-literature
| S-EPMC11531524 | biostudies-literature
| S-EPMC9455055 | biostudies-literature
| S-EPMC8385275 | biostudies-literature
| S-EPMC7084661 | biostudies-literature
| S-EPMC3932473 | biostudies-literature