Intelligibility of clear speech: effect of instruction.
Ontology highlight
ABSTRACT: The authors investigated how clear speech instructions influence sentence intelligibility.Twelve speakers produced sentences in habitual, clear, hearing impaired, and overenunciate conditions. Stimuli were amplitude normalized and mixed with multitalker babble for orthographic transcription by 40 listeners. The main analysis investigated percentage-correct intelligibility scores as a function of the 4 conditions and speaker sex. Additional analyses included listener response variability, individual speaker trends, and an alternate intelligibility measure: proportion of content words correct.Relative to the habitual condition, the overenunciate condition was associated with the greatest intelligibility benefit, followed by the hearing impaired and clear conditions. Ten speakers followed this trend. The results indicated different patterns of clear speech benefit for male and female speakers. Greater listener variability was observed for speakers with inherently low habitual intelligibility compared to speakers with inherently high habitual intelligibility. Stable proportions of content words were observed across conditions.Clear speech instructions affected the magnitude of the intelligibility benefit. The instruction to overenunciate may be most effective in clear speech training programs. The findings may help explain the range of clear speech intelligibility benefit previously reported. Listener variability analyses suggested the importance of obtaining multiple listener judgments of intelligibility, especially for speakers with inherently low habitual intelligibility.
Project description:This study investigated how different instructions for eliciting clear speech affected selected acoustic measures of speech.Twelve speakers were audio-recorded reading 18 different sentences from the Assessment of Intelligibility of Dysarthric Speech ( Yorkston & Beukelman, 1984). Sentences were produced in habitual, clear, hearing impaired, and overenunciate conditions. A variety of acoustic measures were obtained.Relative to habitual, the clear, hearing impaired, and overenunciate conditions were associated with different magnitudes of acoustic change for measures of vowel production, speech timing, and vocal intensity. The overenunciate condition tended to yield the greatest magnitude of change in vowel spectral measures and speech timing, followed by the hearing impaired and clear conditions. SPL tended to be the greatest in the hearing impaired condition for half of the speakers studied.Different instructions for eliciting clear speech yielded acoustic adjustments of varying magnitude. Results have implications for direct comparison of studies using different instructions for eliciting clear speech. Results also have implications for optimizing clear speech training programs.
Project description:Many existing speech intelligibility prediction (SIP) algorithms can only account for acoustic factors affecting speech intelligibility and cannot predict intelligibility across corpora with different linguistic predictability. To address this, a linguistic component was added to five existing SIP algorithms by estimating linguistic corpus predictability using a pre-trained language model. The results showed improved SIP performance in terms of correlation and prediction error over a mixture of four datasets, each with a different English open-set corpus.
Project description:This study examined whether visual speech provides speech-rhythm information that perceivers can use in speech perception. This was tested by using speech that naturally varied in the familiarity of its rhythm. Thirty Australian English L1 listeners performed a speech perception in noise task with English sentences produced by three speakers: an English L1 speaker (familiar rhythm); an experienced English L2 speaker who had a weak foreign accent (familiar rhythm), and an inexperienced English L2 speaker who had a strong foreign accent (unfamiliar speech rhythm). The spoken sentences were presented in three conditions: Audio-Only (AO), Audio-Visual with mouth covered (AVm), and Audio-Visual (AV). Speech was best recognized in the AV condition regardless of the degree of foreign accent. However, speech recognition in AVm was better than AO for the speech with no foreign accent and with a weak accent, but not for the speech with a strong accent. A follow-up experiment was conducted that only used the speech with a strong foreign accent, under more audible conditions. The results also showed no difference between the AVm and AO conditions, indicating the null effect was not due to a floor effect. We propose that speech rhythm is conveyed by the motion of the jaw opening and closing, and perceivers use this information to better perceive speech in noise.
Project description:OBJECTIVE:This study aimed to investigate the effect of hearing protection devices (HPDs) on speech intelligibility in Persian work environments. Three current earmuffs and three earplugs and one of the prototypes of molded earplug were tested on 15 male subjects which were randomly selected. The noise reduction of HPDs was measured based on the Real Ear Attenuation at Threshold (REAT) method. Speech intelligibility during using HPDs was determined based on the speech discrimination score (SDS) at two signal to noise (S/N) ratios (0 and?+?5). Data were analyzed using SPSS 22. RESULTS:The actual to nominal noise reduction rating values were from 47 to 84% for HPDs. At two S/N ratios, no significant differences were observed in speech intelligibility using HPDs (p?>?0.05). At S/N ratio?=?0, the speech intelligibility descriptively has been only improved by using common earmuffs up to 9.07%. There was a significant difference up to 21.27% in speech intelligibility for proposed molded earplugs at S/N ratio?=?0 (p?<?0.05). Increasing the HPDs' noise attenuation values led to an increase in speech interference (p?<?0.05). The HPDs with the minimum required noise attenuation while maintaining acceptable speech intelligibility should be worn by employees exposed to medium noise levels.
Project description:Perceiving speech in background noise presents a significant challenge to listeners. Intelligibility can be improved by seeing the face of a talker. This is of particular value to hearing impaired people and users of cochlear implants. It is well known that auditory-only speech understanding depends on factors beyond audibility. How these factors impact on the audio-visual integration of speech is poorly understood. We investigated audio-visual integration when either the interfering background speech (Experiment 1) or intelligibility of the target talkers (Experiment 2) was manipulated. Clear speech was also contrasted with sine-wave vocoded speech to mimic the loss of temporal fine structure with a cochlear implant. Experiment 1 showed that for clear speech, the visual speech benefit was unaffected by the number of background talkers. For vocoded speech, a larger benefit was found when there was only one background talker. Experiment 2 showed that visual speech benefit depended upon the audio intelligibility of the talker and increased as intelligibility decreased. Degrading the speech by vocoding resulted in even greater benefit from visual speech information. A single "independent noise" signal detection theory model predicted the overall visual speech benefit in some conditions but could not predict the different levels of benefit across variations in the background or target talkers. This suggests that, similar to audio-only speech intelligibility, the integration of audio-visual speech cues may be functionally dependent on factors other than audibility and task difficulty, and that clinicians and researchers should carefully consider the characteristics of their stimuli when assessing audio-visual integration.
Project description:BackgroundThe influence of hearing impairment on everyday hearing can be estimated by speech audiometry. There is a great deal of variability in the dependence of word recognition scores on pure-tone hearing loss.Materials and methodsA large clinical database of 28,261 records with complete tone and speech audiometry data was analyzed. The maximum monosyllabic word recognition score was evaluated as a function of hearing loss. Its distribution was analyzed in detail.ResultsIn a rank analysis, the distribution of percentiles was determined as a function of pure-tone hearing loss up to 80 dBHL.ConclusionThe percentiles of the distribution of maximum word recognition scores for a given pure-tone hearing loss derived here can be used as reference values for a disproportionately high loss of speech recognition.
Project description:Neural speech tracking has advanced our understanding of how our brains rapidly map an acoustic speech signal onto linguistic representations and ultimately meaning. It remains unclear, however, how speech intelligibility is related to the corresponding neural responses. Many studies addressing this question vary the level of intelligibility by manipulating the acoustic waveform, but this makes it difficult to cleanly disentangle effects of intelligibility from underlying acoustical confounds. Here, using magnetoencephalography (MEG) recordings, we study neural measures of speech intelligibility by manipulating intelligibility while keeping the acoustics strictly unchanged. Acoustically identical degraded speech stimuli (three-band noise vocoded, ~20 s duration) are presented twice, but the second presentation is preceded by the original (non-degraded) version of the speech. This intermediate priming, which generates a 'pop-out' percept, substantially improves the intelligibility of the second degraded speech passage. We investigate how intelligibility and acoustical structure affects acoustic and linguistic neural representations using multivariate Temporal Response Functions (mTRFs). As expected, behavioral results confirm that perceived speech clarity is improved by priming. TRF analysis reveals that auditory (speech envelope and envelope onset) neural representations are not affected by priming, but only by the acoustics of the stimuli (bottom-up driven). Critically, our findings suggest that segmentation of sounds into words emerges with better speech intelligibility, and most strongly at the later (~400 ms latency) word processing stage, in prefrontal cortex (PFC), in line with engagement of top-down mechanisms associated with priming. Taken together, our results show that word representations may provide some objective measures of speech comprehension.
Project description:We systematically determined which spectrotemporal modulations in speech are necessary for comprehension by human listeners. Speech comprehension has been shown to be robust to spectral and temporal degradations, but the specific relevance of particular degradations is arguable due to the complexity of the joint spectral and temporal information in the speech signal. We applied a novel modulation filtering technique to recorded sentences to restrict acoustic information quantitatively and to obtain a joint spectrotemporal modulation transfer function for speech comprehension, the speech MTF. For American English, the speech MTF showed the criticality of low modulation frequencies in both time and frequency. Comprehension was significantly impaired when temporal modulations <12 Hz or spectral modulations <4 cycles/kHz were removed. More specifically, the MTF was bandpass in temporal modulations and low-pass in spectral modulations: temporal modulations from 1 to 7 Hz and spectral modulations <1 cycles/kHz were the most important. We evaluated the importance of spectrotemporal modulations for vocal gender identification and found a different region of interest: removing spectral modulations between 3 and 7 cycles/kHz significantly increases gender misidentifications of female speakers. The determination of the speech MTF furnishes an additional method for producing speech signals with reduced bandwidth but high intelligibility. Such compression could be used for audio applications such as file compression or noise removal and for clinical applications such as signal processing for cochlear implants.
Project description:Vibrotactile stimulation is believed to enhance auditory speech perception, offering potential benefits for cochlear implant (CI) users who may utilize compensatory sensory strategies. Our study advances previous research by directly comparing tactile speech intelligibility enhancements in normal-hearing (NH) and CI participants, using the same paradigm. Moreover, we assessed tactile enhancement considering stimulus non-specific, excitatory effects through an incongruent audio-tactile control condition that did not contain any speech-relevant information. In addition to this incongruent audio-tactile condition, we presented sentences in an auditory only and a congruent audio-tactile condition, with the congruent tactile stimulus providing low-frequency envelope information via a vibrating probe on the index fingertip. The study involved 23 NH listeners and 14 CI users. In both groups, significant tactile enhancements were observed for congruent tactile stimuli (5.3% for NH and 5.4% for CI participants), but not for incongruent tactile stimulation. These findings replicate previously observed tactile enhancement effects. Juxtaposing our study with previous research, the informational content of the tactile stimulus emerges as a modulator of intelligibility: Generally, congruent stimuli enhanced, non-matching tactile stimuli reduced, and neutral stimuli did not change test outcomes. We conclude that the temporal cues provided by congruent vibrotactile stimuli may aid in parsing continuous speech signals into syllables and words, consequently leading to the observed improvements in intelligibility.
Project description:The perceptual consequences of rate reduction, increased vocal intensity, and clear speech were studied in speakers with multiple sclerosis (MS), Parkinson's disease (PD), and healthy controls.Seventy-eight speakers read sentences in habitual, clear, loud, and slow conditions. Sentences were equated for peak amplitude and mixed with multitalker babble for presentation to listeners. Using a computerized visual analog scale, listeners judged intelligibility or speech severity as operationally defined in Sussman and Tjaden (2012).Loud and clear but not slow conditions improved intelligibility relative to the habitual condition. With the exception of the loud condition for the PD group, speech severity did not improve above habitual and was reduced relative to habitual in some instances. Intelligibility and speech severity were strongly related, but relationships for disordered speakers were weaker in clear and slow conditions versus habitual.Both clear and loud speech show promise for improving intelligibility and maintaining or improving speech severity in multitalker babble for speakers with mild dysarthria secondary to MS or PD, at least as these perceptual constructs were defined and measured in this study. Although scaled intelligibility and speech severity overlap, the metrics further appear to have some separate value in documenting treatment-related speech changes.