Listening in Naturalistic Scenes: What Can Functional Near-Infrared Spectroscopy and Intersubject Correlation Analysis Tell Us About the Underlying Brain Activity?
ABSTRACT: Listening to speech in the noisy conditions of everyday life can be effortful, reflecting the increased cognitive workload involved in extracting meaning from a degraded acoustic signal. Studying the underlying neural processes has the potential to provide mechanistic insight into why listening is effortful under certain conditions. In a move toward studying listening effort under ecologically relevant conditions, we used the silent and flexible neuroimaging technique functional near-infrared spectroscopy (fNIRS) to examine brain activity during attentive listening to speech in naturalistic scenes. Thirty normally hearing participants listened to a series of narratives continuously varying in acoustic difficulty while undergoing fNIRS imaging. Participants then listened to another set of closely matched narratives and rated perceived effort and intelligibility for each scene. As expected, self-reported effort generally increased with worsening signal-to-noise ratio. After controlling for better-ear signal-to-noise ratio, perceived effort was greater in scenes that contained competing speech than in those that did not, potentially reflecting an additional cognitive cost of overcoming informational masking. We analyzed the fNIRS data using intersubject correlation, a data-driven approach suitable for analyzing data collected under naturalistic conditions. Significant intersubject correlation was seen in the bilateral auditory cortices and in a range of channels across the prefrontal cortex. The involvement of prefrontal regions is consistent with the notion that higher order cognitive processes are engaged during attentive listening to speech in complex real-world conditions. However, further research is needed to elucidate the relationship between perceived listening effort and activity in these extended cortical networks.
Project description:Speech provides a powerful means for sharing emotions. Here we implement novel intersubject phase synchronization and whole-brain dynamic connectivity measures to show that networks of brain areas become synchronized across participants who are listening to emotional episodes in spoken narratives. Twenty participants' hemodynamic brain activity was measured with functional magnetic resonance imaging (fMRI) while they listened to 45-s narratives describing unpleasant, neutral, and pleasant events spoken in neutral voice. After scanning, participants listened to the narratives again and rated continuously their feelings of pleasantness-unpleasantness (valence) and of arousal-calmness. Instantaneous intersubject phase synchronization (ISPS) measures were computed to derive both multi-subject voxel-wise similarity measures of hemodynamic activity and inter-area functional dynamic connectivity (seed-based phase synchronization, SBPS). Valence and arousal time series were subsequently used to predict the ISPS and SBPS time series. High arousal was associated with increased ISPS in the auditory cortices and in Broca's area, and negative valence was associated with enhanced ISPS in the thalamus, anterior cingulate, lateral prefrontal, and orbitofrontal cortices. Negative valence affected functional connectivity of fronto-parietal, limbic (insula, cingulum) and fronto-opercular circuitries, and positive arousal affected the connectivity of the striatum, amygdala, thalamus, cerebellum, and dorsal frontal cortex. Positive valence and negative arousal had markedly smaller effects. We propose that high arousal synchronizes the listeners' sound-processing and speech-comprehension networks, whereas negative valence synchronizes circuitries supporting emotional and self-referential processing.
Project description:Perceived listening effort was assessed for a monaural irregular-rhythm detection task while competing signals were presented to the contralateral ear. When speech was the competing signal, listeners reported greater listening effort compared to either contralateral steady-state noise or no competing signal. Behavioral thresholds for irregular-rhythm detection were unaffected by competing speech, indicating that listeners compensated for this competing signal with effortful listening. These results suggest that perceived listening effort may be associated with suppression of task-irrelevant information, even for conditions where informational masking and competition for linguistic processing resources would not be expected.
Project description:In recent years, there has been increasing interest in studying listening effort. Research on listening effort intersects with the development of active theories of speech perception and contributes to the broader endeavor of understanding speech perception within the context of neuroscientific theories of perception, attention, and effort. Due to the multidisciplinary nature of the problem, researchers vary widely in their precise conceptualization of the catch-all term listening effort. Very recent consensus work stresses the relationship between listening effort and the allocation of cognitive resources, providing a conceptual link to current cognitive neuropsychological theories associating effort with the allocation of selective attention. By linking listening effort to attentional effort, we enable the application of a taxonomy of external and internal attention to the characterization of effortful listening. More specifically, we use a vectorial model to decompose the demand causing listening effort into its mutually orthogonal external and internal components and map the relationship between demanded and exerted effort by means of a resource-limiting term that can represent the influence of motivation as well as vigilance and arousal. Due to its quantitative nature and easy graphical interpretation, this model can be applied to a broad range of problems dealing with listening effort. As such, we conclude that the model provides a good starting point for further research on effortful listening within a more differentiated neuropsychological framework.
Project description:Listening to speech in noise is effortful, particularly for people with hearing impairment. While it is known that effort is related to a complex interplay between bottom-up and top-down processes, the cognitive and neurophysiological mechanisms contributing to effortful listening remain unknown. Therefore, a reliable physiological measure to assess effort remains elusive. This study aimed to determine whether pupil dilation and alpha power change, two physiological measures suggested to index listening effort, assess similar processes. Listening effort was manipulated by parametrically varying spectral resolution (16- and 6-channel noise vocoding) and speech reception thresholds (SRT; 50% and 80%) while 19 young, normal-hearing adults performed a speech recognition task in noise. Results of off-line sentence scoring showed discrepancies between the target SRTs and the true performance obtained during the speech recognition task. For example, in the SRT80% condition, participants scored an average of 64.7%. Participants' true performance levels were therefore used for subsequent statistical modelling. Results showed that both measures appeared to be sensitive to changes in spectral resolution (channel vocoding), while pupil dilation only was also significantly related to their true performance levels (%) and task accuracy (i.e., whether the response was correctly or partially recalled). The two measures were not correlated, suggesting they each may reflect different cognitive processes involved in listening effort. This combination of findings contributes to a growing body of research aiming to develop an objective measure of listening effort.
Project description:Individuals with hearing loss allocate cognitive resources to comprehend noisy speech in everyday life scenarios. Such a scenario could be when they are exposed to ongoing speech and need to sustain their attention for a rather long period of time, which requires listening effort. Two well-established physiological methods that have been found to be sensitive to identify changes in listening effort are pupillometry and electroencephalography (EEG). However, these measurements have been used mainly for momentary, evoked or episodic effort. The aim of this study was to investigate how sustained effort manifests in pupillometry and EEG, using continuous speech with varying signal-to-noise ratio (SNR). Eight hearing-aid users participated in this exploratory study and performed a continuous speech-in-noise task. The speech material consisted of 30-second continuous streams that were presented from loudspeakers to the right and left side of the listener (±30° azimuth) in the presence of 4-talker background noise (+180° azimuth). The participants were instructed to attend either to the right or left speaker and ignore the other in a randomized order with two different SNR conditions: 0 dB and -5 dB (the difference between the target and the competing talker). The effects of SNR on listening effort were explored objectively using pupillometry and EEG. The results showed larger mean pupil dilation and decreased EEG alpha power in the parietal lobe during the more effortful condition. This study demonstrates that both measures are sensitive to changes in SNR during continuous speech.
Project description:Understanding speech is effortless in ideal situations, and although adverse conditions, such as caused by hearing impairment, often render it an effortful task, they do not necessarily suspend speech comprehension. A prime example of this is speech perception by cochlear implant users, whose hearing prostheses transmit speech as a significantly degraded signal. It is yet unknown how mechanisms of speech processing deal with such degraded signals, and whether they are affected by effortful processing of speech. This paper compares the automatic process of lexical competition between natural and degraded speech, and combines gaze fixations, which capture the course of lexical disambiguation, with pupillometry, which quantifies the mental effort involved in processing speech. Listeners' ocular responses were recorded during disambiguation of lexical embeddings with matching and mismatching durational cues. Durational cues were selected due to their substantial role in listeners' quick limitation of the number of lexical candidates for lexical access in natural speech. Results showed that lexical competition increased mental effort in processing natural stimuli in particular in presence of mismatching cues. Signal degradation reduced listeners' ability to quickly integrate durational cues in lexical selection, and delayed and prolonged lexical competition. The effort of processing degraded speech was increased overall, and because it had its sources at the pre-lexical level this effect can be attributed to listening to degraded speech rather than to lexical disambiguation. In sum, the course of lexical competition was largely comparable for natural and degraded speech, but showed crucial shifts in timing, and different sources of increased mental effort. We argue that well-timed progress of information from sensory to pre-lexical and lexical stages of processing, which is the result of perceptual adaptation during speech development, is the reason why in ideal situations speech is perceived as an undemanding task. Degradation of the signal or the receiver channel can quickly bring this well-adjusted timing out of balance and lead to increase in mental effort. Incomplete and effortful processing at the early pre-lexical stages has its consequences on lexical processing as it adds uncertainty to the forming and revising of lexical hypotheses.
Project description:Earlier studies have shown considerable intersubject synchronization of brain activity when subjects watch the same movie or listen to the same story. Here we investigated the across-subjects similarity of brain responses to speech and non-speech sounds in a continuous audio drama designed for blind people. Thirteen healthy adults listened for ?19 min to the audio drama while their brain activity was measured with 3 T functional magnetic resonance imaging (fMRI). An intersubject-correlation (ISC) map, computed across the whole experiment to assess the stimulus-driven extrinsic brain network, indicated statistically significant ISC in temporal, frontal and parietal cortices, cingulate cortex, and amygdala. Group-level independent component (IC) analysis was used to parcel out the brain signals into functionally coupled networks, and the dependence of the ICs on external stimuli was tested by comparing them with the ISC map. This procedure revealed four extrinsic ICs of which two-covering non-overlapping areas of the auditory cortex-were modulated by both speech and non-speech sounds. The two other extrinsic ICs, one left-hemisphere-lateralized and the other right-hemisphere-lateralized, were speech-related and comprised the superior and middle temporal gyri, temporal poles, and the left angular and inferior orbital gyri. In areas of low ISC four ICs that were defined intrinsic fluctuated similarly as the time-courses of either the speech-sound-related or all-sounds-related extrinsic ICs. These ICs included the superior temporal gyrus, the anterior insula, and the frontal, parietal and midline occipital cortices. Taken together, substantial intersubject synchronization of cortical activity was observed in subjects listening to an audio drama, with results suggesting that speech is processed in two separate networks, one dedicated to the processing of speech sounds and the other to both speech and non-speech sounds.
Project description:Monitoring speech tasks with functional near-infrared spectroscopy (fNIRS) enables investigation of speech production mechanisms and informs treatment strategies for speech-related disorders such as stuttering. Unfortunately, due to movement of the temporalis muscle, speech production can induce relative movement between probe optodes and skin. These movements generate motion artifacts during speech tasks. In practice, spurious hemodynamic responses in functional activation signals arise from lack of information about the consequences of speech-related motion artifacts, as well as from lack of standardized processing procedures for fNIRS signals during speech tasks. To this end, we characterize the effects of speech production on fNIRS signals, and we introduce a systematic analysis to ameliorate motion artifacts. The study measured 50 healthy subjects performing jaw movement (JM) tasks and found that JM produces two different patterns of motion artifacts in fNIRS. To remove these unwanted contributions, we validate a hybrid motion-correction algorithm based sequentially on spline interpolation and then wavelet filtering. We compared performance of the hybrid algorithm with standard algorithms based on spline interpolation only and wavelet decomposition only. The hybrid algorithm corrected 94% of the artifacts produced by JM, and it did not lead to spurious responses in the data. We also validated the hybrid algorithm during a reading task performed under two different conditions: reading aloud and reading silently. For both conditions, we observed significant cortical activation in brain regions related to reading. Moreover, when comparing the two conditions, good agreement of spatial and temporal activation patterns was found only when data were analyzed using the hybrid approach. Overall, the study demonstrates a standardized processing scheme for fNIRS data during speech protocols. The scheme decreases spurious responses and intersubject variability due to motion artifacts.
Project description:Communication success under adverse conditions requires efficient and effective recruitment of both bottom-up (sensori-perceptual) and top-down (cognitive-linguistic) resources to decode the intended auditory-verbal message. Employing these limited capacity resources has been shown to vary across the lifespan, with evidence indicating that younger adults out-perform older adults for both comprehension and memory of the message. This study examined how sources of interference arising from the speaker (message spoken with conversational vs. clear speech technique), the listener (hearing-listening and cognitive-linguistic factors), and the environment (in competing speech babble noise vs. quiet) interact and influence learning and memory performance using more ecologically valid methods than has been done previously. The results suggest that when older adults listened to complex medical prescription instructions with "clear speech," (presented at audible levels through insertion earphones) their learning efficiency, immediate, and delayed memory performance improved relative to their performance when they listened with a normal conversational speech rate (presented at audible levels in sound field). This better learning and memory performance for clear speech listening was maintained even in the presence of speech babble noise. The finding that there was the largest learning-practice effect on 2nd trial performance in the conversational speech when the clear speech listening condition was first is suggestive of greater experience-dependent perceptual learning or adaptation to the speaker's speech and voice pattern in clear speech. This suggests that experience-dependent perceptual learning plays a role in facilitating the language processing and comprehension of a message and subsequent memory encoding.
Project description:Healthy aging is accompanied by listening difficulties, including decreased speech comprehension, that stem from an ill-understood combination of sensory and cognitive changes. Here, we use electroencephalography to demonstrate that auditory neural oscillations of older adults entrain less firmly and less flexibly to speech-paced (?3?Hz) rhythms than younger adults' during attentive listening. These neural entrainment effects are distinct in magnitude and origin from the neural response to sound per se. Non-entrained parieto-occipital alpha (8-12?Hz) oscillations are enhanced in young adults, but suppressed in older participants, during attentive listening. Entrained neural phase and task-induced alpha amplitude exert opposite, complementary effects on listening performance: higher alpha amplitude is associated with reduced entrainment-driven behavioural performance modulation. Thus, alpha amplitude as a task-driven, neuro-modulatory signal can counteract the behavioural corollaries of neural entrainment. Balancing these two neural strategies may present new paths for intervention in age-related listening difficulties.