Project description:Speech Emotion Recognition (SER) identifies and categorizes emotional states by analyzing speech signals. SER is an emerging research area using machine learning and deep learning techniques due to its socio-cultural and business importance. An appropriate dataset is an important resource for SER related studies in a particular language. There is an apparent lack of SER datasets in Bangla language although it is one of the most spoken languages in the world. There are a few Bangla SER datasets but those consist of only a few dialogs with a minimal number of actors making them unsuitable for real-world applications. Moreover, the existing datasets do not consider the intensity level of emotions. The intensity of a specific emotional expression, such as anger or sadness, plays a crucial role in social behavior. Therefore, a realistic Bangla speech dataset is developed in this study which is called KUET Bangla Emotional Speech (KBES) dataset. The dataset consists of 900 audio signals (i.e., speech dialogs) from 35 actors (20 females and 15 males) with diverse age ranges. Source of the speech dialogs are Bangla Telefilm, Drama, TV Series, Web Series. There are five emotional categories: Neutral, Happy, Sad, Angry, and Disgust. Except Neutral, samples of a particular emotion are divided into two intensity levels: Low and High. The significant issue of the dataset is that the speech dialogs are almost unique with relatively large number of actors; whereas, existing datasets (such as SUBESCO and BanglaSER) contain samples with repeatedly spoken of a few pre-defined dialogs by a few actors/research volunteers in the laboratory environment. Finally, the KBES dataset is exposed as a nine-class problem to classify emotions into nine categories: Neutral, Happy (Low), Happy (High), Sad (Low), Sad (High), Angry (Low), Angry (High), Disgust (Low) and Disgust (High). However, the dataset is kept symmetrical containing 100 samples for each of the nine classes; 100 samples are also gender balanced with 50 samples for male/female actors. The developed dataset seems a realistic dataset while compared with the existing SER datasets.

Project description:The most complex interactions between human beings occur through speech, and often in the presence of background noise. Understanding speech in noisy environments requires the integrity of highly integrated and widespread auditory networks likely to be impacted by multiple sclerosis (MS) related neurogenic injury. Despite the impact auditory communication has on a person's ability to navigate the world, build relationships, and maintain employability; studies of speech-in-noise (SiN) perception in people with MS (pwMS) have been minimal to date. Thus, this paper presents a dataset related to the acquisition of pure-tone thresholds, SiN performance and questionnaire responses in age-matched controls and pwMS. Bilateral pure-tone hearing thresholds were obtained at frequencies of 250 hertz (Hz), 500 Hz, 750 Hz, 1000 Hz, 1500 Hz, 2000 Hz, 4000 Hz, 6000 Hz and 8000 Hz, and hearing thresholds were defined as the lowest level at which the tone was perceived 50% of the time. Thresholds at 500 Hz, 1000 Hz, 2000 Hz and 4000 Hz were used to calculate the four-tone average for each participant, and only those with a bilateral four tone average of ≤ 25 dB HL were included in the analysis. To investigate SiN performance in pwMS, pre-recorded Bamford-Kowal-Bench (BKB) sentences were presented binaurally through headphones at five signal-to-noise ratios (SNR) in two noise conditions: speech-weighted noise and multi-talker babble. Participants were required to verbally repeat each sentence they had just heard; or indicate their inability to do so. A 33-item questionnaire, based on validated inventories for specific adult clinical populations with abnormal auditory processing, was used to evaluate auditory processing in daily life for pwMS. For analysis, pwMS were grouped according to their Expanded Disability Status Scale (EDSS) score as rated by a neurologist. PwMS with EDSS scores ≤ 1.5 were classified as 'mild' (n = 20); between 2 and 4.5 as 'moderate' (n = 16) and between 5 and 7 as 'advanced' (n = 10) and were compared to neurologically healthy controls (n = 38). The outcomes of the SiN task conducted in pwMS can be found in Iva et al., (2021). The present data has important implications for the timing and delivery of preparatory education to patients, family, and caregivers about communication abilities in pwMS. This dataset will also be valuable for the reuse/reanalysis required for future investigations into the clinical utility of SiN tasks to monitor disease progression.

Dataset Information

Dataset of directional room impulse responses for realistic speech data

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets