Project description:ObjectivesEveryday listening environments are filled with competing noise and distractors. Although significant research has examined the effect of competing noise on speech recognition and listening effort, little is understood about the effect of distraction. The framework for understanding effortful listening recognizes the importance of attention-related processes in speech recognition and listening effort; however, it underspecifies the role that they play, particularly with respect to distraction. The load theory of attention predicts that resources will be automatically allocated to processing a distractor, but only if perceptual load in the listening task is low enough. If perceptual load is high (i.e., listening in noise), then resources that would otherwise be allocated to processing a distractor are used to overcome the increased perceptual load and are unavailable for distractor processing. Although there is ample evidence for this theory in the visual domain, there has been little research investigating how the load theory of attention may apply to speech processing. In this study, we sought to measure the effect of distractors on speech recognition and listening effort and to evaluate whether the load theory of attention can be used to understand a listener's resource allocation in the presence of distractors.DesignFifteen adult listeners participated in a monosyllabic words repetition task. Test stimuli were presented in quiet or in competing speech (+5 dB signal-to-noise ratio) and in distractor or no distractor conditions. In conditions with distractors, auditory distractors were presented before the target words on 24% of the trials in quiet and in noise. Percent-correct was recorded as speech recognition, and verbal response time (VRT) was recorded as a measure of listening effort.ResultsA significant interaction was present for speech recognition, showing reduced speech recognition when distractors were presented in the quiet condition but no effect of distractors when noise was present. VRTs were significantly longer when distractors were present, regardless of listening condition.ConclusionsConsistent with the load theory of attention, distractors significantly reduced speech recognition in the low-perceptual load condition (i.e., listening in quiet) but did not impact speech recognition scores in conditions of high perceptual load (i.e., listening in noise). The increases in VRTs in the presence of distractors in both low- and high-perceptual load conditions (i.e., quiet and noise) suggest that the load theory of attention may not apply to listening effort. However, the large effect of distractors on VRT in both conditions is consistent with the previous work demonstrating that distraction-related shifts of attention can delay processing of the target task. These findings also fit within the framework for understanding effortful listening, which proposes that involuntary attentional shifts result in a depletion of cognitive resources, leaving less resources readily available to process the signal of interest; resulting in increased listening effort (i.e., elongated VRT).
Project description:Listening effort is a valuable and important notion to measure because it is among the primary complaints of people with hearing loss. It is tempting and intuitive to accept speech intelligibility scores as a proxy for listening effort, but this link is likely oversimplified and lacks actionable explanatory power. This study was conducted to explain the mechanisms of listening effort that are not captured by intelligibility scores, using sentence-repetition tasks where specific kinds of mistakes were prospectively planned or analyzed retrospectively. Effort measured as changes in pupil size among 20 listeners with normal hearing and 19 listeners with cochlear implants. Experiment 1 demonstrates that mental correction of misperceived words increases effort even when responses are correct. Experiment 2 shows that for incorrect responses, listening effort is not a function of the proportion of words correct but is rather driven by the types of errors, position of errors within a sentence, and the need to resolve ambiguity, reflecting how easily the listener can make sense of a perception. A simple taxonomy of error types is provided that is both intuitive and consistent with data from these two experiments. The diversity of errors in these experiments implies that speech perception tasks can be designed prospectively to elicit the mistakes that are more closely linked with effort. Although mental corrective action and number of mistakes can scale together in many experiments, it is possible to dissociate them to advance toward a more explanatory (rather than correlational) account of listening effort.
Project description:The talking face affords multiple types of information. To isolate cortical sites with responsibility for integrating linguistically relevant visual speech cues, speech and nonspeech face gestures were presented in natural video and point-light displays during fMRI scanning at 3.0T. Participants with normal hearing viewed the stimuli and also viewed localizers for the fusiform face area (FFA), the lateral occipital complex (LOC), and the visual motion (V5/MT) regions of interest (ROIs). The FFA, the LOC, and V5/MT were significantly less activated for speech relative to nonspeech and control stimuli. Distinct activation of the posterior superior temporal sulcus and the adjacent middle temporal gyrus to speech, independent of media, was obtained in group analyses. Individual analyses showed that speech and nonspeech stimuli were associated with adjacent but different activations, with the speech activations more anterior. We suggest that the speech activation area is the temporal visual speech area (TVSA), and that it can be localized with the combination of stimuli used in this study.
Project description:Recently, the measurement of the pupil dilation response has been applied in many studies to assess listening effort. Meanwhile, the mechanisms underlying this response are still largely unknown. We present the results of a method that separates the influence of the parasympathetic and sympathetic branches of the autonomic nervous system on the pupil response during speech perception. This is achieved by changing the background illumination level. In darkness, the influence of the parasympathetic nervous system on the pupil response is minimal, whereas in light, there is an additional component from the parasympathetic nervous system. Nineteen hearing-impaired and 27 age-matched normal-hearing listeners performed speech reception threshold tests targeting a 50% correct performance level while pupil responses were recorded. The target speech was masked with a competing talker. The test was conducted twice, once in dark and once in a light condition. Need for Recovery and Checklist Individual Strength questionnaires were acquired as indices of daily-life fatigue. In dark, the peak pupil dilation (PPD) did not differ between the two groups, but in light, the normal-hearing group showed a larger PPD than the hearing-impaired group. Listeners with better hearing acuity showed larger differences in dilation between dark and light. These results indicate a larger effect of parasympathetic inhibition on the pupil dilation response of listeners with better hearing acuity, and a relatively high parasympathetic activity in those with worse hearing. Previously observed differences in PPD between normal and impaired listeners are probably not solely because of differences in listening effort.
Project description:Multi-talker speech intelligibility requires successful separation of the target speech from background speech. Successful speech segregation relies on bottom-up neural coding fidelity of sensory information and top-down effortful listening. Here, we studied the interaction between temporal processing measured using Envelope Following Responses (EFRs) to amplitude modulated tones, and pupil-indexed listening effort, as it related to performance on the Quick Speech-in-Noise (QuickSIN) test in normal-hearing adults. Listening effort increased at the more difficult signal-to-noise ratios, but speech intelligibility only decreased at the hardest signal-to-noise ratio. Pupil-indexed listening effort and EFRs did not independently relate to QuickSIN performance. However, the combined effects of both EFRs and listening effort explained significant variance in QuickSIN performance. Our results suggest a synergistic interaction between sensory coding and listening effort as it relates to multi-talker speech intelligibility. These findings can inform the development of next-generation multi-dimensional approaches for testing speech intelligibility deficits in listeners with normal-hearing.
Project description:Background and objectivesSpeech in noise (SIN) perception is essential for effective day-to-day communication, as everyday conversations seldom transpire in silent environments. Numerous studies have documented how musical training can aid in SIN discrimination through various neural-pathways, such as experience-dependent plasticity and overlapping processes between music and speech perception. However, empirical evidence regarding the impact of musical training on SIN perception remains inconclusive. This study aimed to investigate whether musicians trained in South Indian classical "Carnatic" style of music exhibited a distinct advantage over their non-musician counterparts in SIN perception. The study also attempted to explore whether the listening effort (LE) associated in this process was different across musicians and non-musicians, an area that has received limited attention.Subjects and methodsA quasi-experimental design was employed, involving two groups comprising 25 musicians and 35 non-musicians, aged 18-35 years, with normal hearing. In phase 1, participants' musical abilities were assessed using the Mini-Profile of Music Perception Skills (Mini-PROMS). In phase 2, SIN abilities were tested using the Tamil phonemically balanced words and Tamil Matrix Sentence Test at -5 dB, 0 dB, and +5 dB SNR. Phase 3 tested LE using a dual-task paradigm including auditory and visual stimuli as primary and secondary tasks.ResultsFractional logit and linear regression models demonstrated that musicians outperformed non-musicians in the Mini-PROMS assessment. Musicians also fared better than non-musicians in SIN and LE at 0 dB SNR for words and +5 dB SNR for sentences.ConclusionsThe findings of this study provided limited evidence to support the claim that musical training improves speech perception in noisy environments or reduces the associated listening effort.
Project description:Daily-life conversation relies on speech perception in quiet and noise. Because of the COVID-19 pandemic, face masks have become mandatory in many situations. Acoustic attenuation of sound pressure by the mask tissue reduces speech perception ability, especially in noisy situations. Masks also can impede the process of speech comprehension by concealing the movements of the mouth, interfering with lip reading. In this prospective observational, cross-sectional study including 17 participants with normal hearing, we measured the influence of acoustic attenuation caused by medical face masks (mouth and nose protection) according to EN 14683 and of N95 masks according to EN 1149 (EN 14683) on the speech recognition threshold and listening effort in various types of background noise. Averaged over all noise signals, a surgical mask significantly reduced the speech perception threshold in noise was by 1.6 dB (95% confidence interval [CI], 1.0, 2.1) and an N95 mask reduced it significantly by 2.7 dB (95% CI, 2.2, 3.2). Use of a surgical mask did not significantly increase the 50% listening effort signal-to-noise ratio (increase of 0.58 dB; 95% CI, 0.4, 1.5), but use of an N95 mask did so significantly, by 2.2 dB (95% CI, 1.2, 3.1). In acoustic measures, mask tissue reduced amplitudes by up to 8 dB at frequencies above 1 kHz, whereas no reduction was observed below 1 kHz. We conclude that face masks reduce speech perception and increase listening effort in different noise signals. Together with additional interference because of impeded lip reading, the compound effect of face masks could have a relevant impact on daily life communication even in those with normal hearing.
Project description:Facial emotion recognition occupies a prominent place in emotion psychology. How perceivers recognize messages conveyed by faces can be studied in either an explicit or an implicit way, and using different kinds of facial stimuli. In the present study, we explored for the first time how facial point-light displays (PLDs) (i.e., biological motion with minimal perceptual properties) can elicit both explicit and implicit mechanisms of facial emotion recognition. Participants completed tasks of explicit or implicit facial emotion recognition from PLDs. Results showed that point-light stimuli are sufficient to allow facial emotion recognition, be it explicit and implicit. We argue that this finding could encourage the use of PLDs in research on the perception of emotional cues from faces.
Project description:Listeners are routinely exposed to many different types of speech, including artificially-enhanced and synthetic speech, styles which deviate to a greater or lesser extent from naturally-spoken exemplars. While the impact of differing speech types on intelligibility is well-studied, it is less clear how such types affect cognitive processing demands, and in particular whether those speech forms with the greatest intelligibility in noise have a commensurately lower listening effort. The current study measured intelligibility, self-reported listening effort, and a pupillometry-based measure of cognitive load for four distinct types of speech: (i) plain i.e. natural unmodified speech; (ii) Lombard speech, a naturally-enhanced form which occurs when speaking in the presence of noise; (iii) artificially-enhanced speech which involves spectral shaping and dynamic range compression; and (iv) speech synthesized from text. In the first experiment a cohort of 26 native listeners responded to the four speech types in three levels of speech-shaped noise. In a second experiment, 31 non-native listeners underwent the same procedure at more favorable signal-to-noise ratios, chosen since second language listening in noise has a more detrimental effect on intelligibility than listening in a first language. For both native and non-native listeners, artificially-enhanced speech was the most intelligible and led to the lowest subjective effort ratings, while the reverse was true for synthetic speech. However, pupil data suggested that Lombard speech elicited the lowest processing demands overall. These outcomes indicate that the relationship between intelligibility and cognitive processing demands is not a simple inverse, but is mediated by speech type. The findings of the current study motivate the search for speech modification algorithms that are optimized for both intelligibility and listening effort.
Project description:Identifying speech requires that listeners make rapid use of fine-grained acoustic cues-a process that is facilitated by being able to see the talker's face. Face masks present a challenge to this process because they can both alter acoustic information and conceal the talker's mouth. Here, we investigated the degree to which different types of face masks and noise levels affect speech intelligibility and subjective listening effort for young (N = 180) and older (N = 180) adult listeners. We found that in quiet, mask type had little influence on speech intelligibility relative to speech produced without a mask for both young and older adults. However, with the addition of moderate (- 5 dB SNR) and high (- 9 dB SNR) levels of background noise, intelligibility dropped substantially for all types of face masks in both age groups. Across noise levels, transparent face masks and cloth face masks with filters impaired performance the most, and surgical face masks had the smallest influence on intelligibility. Participants also rated speech produced with a face mask as more effortful than unmasked speech, particularly in background noise. Although young and older adults were similarly affected by face masks and noise in terms of intelligibility and subjective listening effort, older adults showed poorer intelligibility overall and rated the speech as more effortful to process relative to young adults. This research will help individuals make more informed decisions about which types of masks to wear in various communicative settings.