Project description:BackgroundSpeech and language cues are considered significant data sources that can reveal insights into one's behavior and well-being. The goal of this study is to evaluate how different machine learning (ML) classifiers trained both on the spoken word and acoustic features during live conversations between family caregivers and a therapist, correlate to anxiety and quality of life (QoL) as assessed by validated instruments.MethodsThe dataset comprised of 124 audio-recorded and professionally transcribed discussions between family caregivers of hospice patients and a therapist, of challenges they faced in their caregiving role, and standardized assessments of self-reported QoL and anxiety. We custom-built and trained an Automated Speech Recognition (ASR) system on older adult voices and created a logistic regression-based classifier that incorporated audio-based features. The classification process automated the QoL scoring and display of the score in real time, replacing hand-coding for self-reported assessments with a machine learning identified classifier.FindingsOf the 124 audio files and their transcripts, 87 of these transcripts (70%) were selected to serve as the training set, holding the remaining 30% of the data for evaluation. For anxiety, the results of adding the dimension of sound and an automated speech-to-text transcription outperformed the prior classifier trained only on human-rendered transcriptions. Specifically, precision improved from 86% to 92%, accuracy from 81% to 89%, and recall from 78% to 88%.InterpretationClassifiers can be developed through ML techniques which can indicate improvements in QoL measures with a reasonable degree of accuracy. Examining the content, sound of the voice and context of the conversation provides insights into additional factors affecting anxiety and QoL that could be addressed in tailored therapy and the design of conversational agents serving as therapy chatbots.
Project description:Children learn words through an accumulation of interactions grounded in context. Although many factors in the learning environment have been shown to contribute to word learning in individual studies, no empirical synthesis connects across factors. We introduce a new ultradense corpus of audio and video recordings of a single child's life that allows us to measure the child's experience of each word in his vocabulary. This corpus provides the first direct comparison, to our knowledge, between different predictors of the child's production of individual words. We develop a series of new measures of the distinctiveness of the spatial, temporal, and linguistic contexts in which a word appears, and show that these measures are stronger predictors of learning than frequency of use and that, unlike frequency, they play a consistent role across different syntactic categories. Our findings provide a concrete instantiation of classic ideas about the role of coherent activities in word learning and demonstrate the value of multimodal data in understanding children's language acquisition.
Project description:The Prosodic Parallelism hypothesis claims adjacent prosodic categories to prefer identical branching of internal adjacent constituents. According to Wiese and Speyer (2015), this preference implies feet contained in the same phonological phrase to display either binary or unary branching, but not different types of branching. The seemingly free schwa-zero alternations at the end of some words in German make it possible to test this hypothesis. The hypothesis was successfully tested by conducting a corpus study which used large-scale bodies of written German. As some open questions remain, and as it is unclear whether Prosodic Parallelism is valid for the spoken modality as well, the present study extends this inquiry to spoken German. As in the previous study, the results of a corpus analysis recruiting a variety of linguistic constructions are presented. The Prosodic Parallelism hypothesis can be demonstrated to be valid for spoken German as well as for written German. The paper thus contributes to the question whether prosodic preferences are similar between the spoken and written modes of a language. Some consequences of the results for the production of language are discussed.
Project description:This study used electrophysiological recordings to a large sample of spoken words to track the time-course of word frequency, phonological neighbourhood density, concreteness and stimulus duration effects in two experiments. Fifty subjects were presented more than a thousand spoken words during either a go/no go lexical decision task (Experiment 1) or a go/no go semantic categorisation task (Experiment 2) while EEG was collected. Linear mixed effects modelling was used to analyze the data. Effects of word frequency were found on the N400 and also as early as 100 ms in Experiment 1 but not Experiment 2. Phonological neighbourhood density produced an early effect around 250 ms and the typical N400 effect. Concreteness elicited effects in later epochs on the N400. Stimulus duration affected all epochs and its influence reflected changes in the timing of the ERP components. Overall the results support cascaded interactive models of spoken word recognition.
Project description:The present study investigates whether meaning is similarly extracted from spoken and sung sentences. For this purpose, subjects listened to semantically correct and incorrect sentences while performing a correctness judgement task. In order to examine underlying neural mechanisms, a multi-methodological approach was chosen combining two neuroscientific methods with behavioral data. In particular, fast dynamic changes reflected in the semantically associated N400 component of the electroencephalography (EEG) were simultaneously assessed with the topographically more fine-grained vascular signals acquired by the functional near-infrared spectroscopy (fNIRS). EEG results revealed a larger N400 for incorrect compared to correct sentences in both spoken and sung sentences. However, the N400 was delayed for sung sentences, potentially due to the longer sentence duration. fNIRS results revealed larger activations for spoken compared to sung sentences irrespective of semantic correctness at predominantly left-hemispheric areas, potentially suggesting a greater familiarity with spoken material. Furthermore, the fNIRS revealed a widespread activation for correct compared to incorrect sentences irrespective of modality, potentially indicating a successful processing of sentence meaning. The combined results indicate similar semantic processing in speech and song.