Implicit Talker Training Improves Comprehension of Auditory Speech in Noise.
ABSTRACT: Previous studies have shown that listeners are better able to understand speech when they are familiar with the talker's voice. In most of these studies, talker familiarity was ensured by explicit voice training; that is, listeners learned to identify the familiar talkers. In the real world, however, the characteristics of familiar talkers are learned incidentally, through communication. The present study investigated whether speech comprehension benefits from implicit voice training; that is, through exposure to talkers' voices without listeners explicitly trying to identify them. During four training sessions, listeners heard short sentences containing a single verb (e.g., "he writes"), spoken by one talker. The sentences were mixed with noise, and listeners identified the verb within each sentence while their speech-reception thresholds (SRT) were measured. In a final test session, listeners performed the same task, but this time they heard different sentences spoken by the familiar talker and three unfamiliar talkers. Familiar and unfamiliar talkers were counterbalanced across listeners. Half of the listeners performed a test session in which the four talkers were presented in separate blocks (blocked paradigm). For the other half, talkers varied randomly from trial to trial (interleaved paradigm). The results showed that listeners had lower SRT when the speech was produced by the familiar talker than the unfamiliar talkers. The type of talker presentation (blocked vs. interleaved) had no effect on this familiarity benefit. These findings suggest that listeners implicitly learn talker-specific information during a speech-comprehension task, and exploit this information to improve the comprehension of novel speech material from familiar talkers.
Project description:Previous research has shown that familiarity with a talker's voice can improve linguistic processing (herein, "Familiar Talker Advantage"), but this benefit is constrained by the context in which the talker's voice is familiar. The current study examined how familiarity affects intelligibility by manipulating the type of talker information available to listeners. One group of listeners learned to identify bilingual talkers' voices from English words, where they learned language-specific talker information. A second group of listeners learned the same talkers from German words, and thus only learned language-independent talker information. After voice training, both groups of listeners completed a word recognition task with English words produced by both familiar and unfamiliar talkers. Results revealed that English-trained listeners perceived more phonemes correct for familiar than unfamiliar talkers, while German-trained listeners did not show improved intelligibility for familiar talkers. The absence of a processing advantage in speech intelligibility for the German-trained listeners demonstrates limitations on the Familiar Talker Advantage, which crucially depends on the language context in which the talkers' voices were learned; knowledge of how a talker produces linguistically relevant contrasts in a particular language is necessary to increase speech intelligibility for words produced by familiar talkers.
Project description:Listeners identify talkers more accurately when listening to their native language compared to an unfamiliar, foreign language. This language-familiarity effect in talker identification has been shown to arise from familiarity with both the sound patterns (phonetics and phonology) and the linguistic content (words) of one's native language. However, it has been unknown whether these two sources of information contribute independently to talker identification abilities, particularly whether hearing familiar words can facilitate talker identification in the absence of familiar phonetics. To isolate the contribution of lexical familiarity, we conducted three experiments that tested listeners' ability to identify talkers saying familiar words, but with unfamiliar phonetics. In two experiments, listeners identified talkers from recordings of their native language (English), an unfamiliar foreign language (Mandarin Chinese), or "hybrid" speech stimuli (sentences spoken in Mandarin, but which can be convincingly coerced to sound like English when presented with subtitles that prime plausible English-language lexical interpretations based on the Mandarin phonetics). In a third experiment, we explored natural variation in lexical-phonetic congruence as listeners identified talkers with varying degrees of a Mandarin accent. Priming listeners to hear English speech did not improve their ability to identify talkers speaking Mandarin, even after additional training, and talker identification accuracy decreased as talkers' phonetics became increasingly dissimilar to American English. Together, these experiments indicate that unfamiliar sound patterns preclude talker identification benefits otherwise afforded by familiar words. These results suggest that linguistic representations contribute hierarchically to talker identification; the facilitatory effect of familiar words requires the availability of familiar phonological forms.
Project description:To understand spoken words, listeners must appropriately interpret co-occurring talker characteristics and speech sound content. This ability was tested in 6- to 14-months-olds by measuring their looking to named food and body part images. In the new talker condition (n = 90), pictures were named by an unfamiliar voice; in the mispronunciation condition (n = 98), infants' mothers "mispronounced" the words (e.g., nazz for nose). Six- to 7-month-olds fixated target images above chance across conditions, understanding novel talkers, and mothers' phonologically deviant speech equally. Eleven- to 14-months-olds also understood new talkers, but performed poorly with mispronounced speech, indicating sensitivity to phonological deviation. Between these ages, performance was mixed. These findings highlight the changing roles of acoustic and phonetic variability in early word comprehension, as infants learn which variations alter meaning.
Project description:Hearing aids (HAs) only partially restore the ability of older hearing impaired (OHI) listeners to understand speech in noise, due in large part to persistent deficits in consonant identification. Here, we investigated whether adaptive perceptual training would improve consonant-identification in noise in sixteen aided OHI listeners who underwent 40 hours of computer-based training in their homes. Listeners identified 20 onset and 20 coda consonants in 9,600 consonant-vowel-consonant (CVC) syllables containing different vowels (/?/, /i/, or /u/) and spoken by four different talkers. Consonants were presented at three consonant-specific signal-to-noise ratios (SNRs) spanning a 12 dB range. Noise levels were adjusted over training sessions based on d' measures. Listeners were tested before and after training to measure (1) changes in consonant-identification thresholds using syllables spoken by familiar and unfamiliar talkers, and (2) sentence reception thresholds (SeRTs) using two different sentence tests. Consonant-identification thresholds improved gradually during training. Laboratory tests of d' thresholds showed an average improvement of 9.1 dB, with 94% of listeners showing statistically significant training benefit. Training normalized consonant confusions and improved the thresholds of some consonants into the normal range. Benefits were equivalent for onset and coda consonants, syllables containing different vowels, and syllables presented at different SNRs. Greater training benefits were found for hard-to-identify consonants and for consonants spoken by familiar than unfamiliar talkers. SeRTs, tested with simple sentences, showed less elevation than consonant-identification thresholds prior to training and failed to show significant training benefit, although SeRT improvements did correlate with improvements in consonant thresholds. We argue that the lack of SeRT improvement reflects the dominant role of top-down semantic processing in processing simple sentences and that greater transfer of benefit would be evident in the comprehension of more unpredictable speech material.
Project description:Purpose Previous studies with children and adults have demonstrated a familiar talker advantage-better word recognition for familiar talkers. The goal of the current study was to test whether this phenomenon is modulated by a child's language ability. Method Sixty children with a range of language ability were trained to learn the voices of 3 foreign-accented, German-English bilingual talkers and received feedback about their performance. Both before and after this talker voice training, children completed a spoken word recognition task in which they heard consonant-vowel-consonant words mixed with noise that were spoken by the 3 familiarized talkers and by 3 unfamiliar German-English bilinguals. Results Two findings emerged from this study: First, children with both higher and lower language ability performed similarly on the familiarized talkers. Second, children with higher language scores performed similarly on both the familiarized and unfamiliar talkers, whereas children with lower language scores performed worse on the unfamiliar talkers compared to familiar talkers, suggesting an inability to generalize to novel, unfamiliar talkers who spoke with a similar accent. Discussion Together, these findings indicate that children with higher language scores are able to generalize knowledge about foreign-accented talkers to help spoken word recognition for novel talkers with the same accent. In contrast, children with lower language skills did not exhibit the same magnitude of generalization. This lack of generalization to similar talkers may mean that children with lower language skills are at a disadvantage in spoken language tasks because they are unable to process speech as well when listening to unfamiliar talkers.
Project description:Children seem able to efficiently interpret a variety of linguistic cues during speech comprehension, yet have difficulty interpreting sources of nonlinguistic and paralinguistic information that accompany speech. The current study asked whether (paralinguistic) voice-activated role knowledge is rapidly interpreted in coordination with a linguistic cue (a sentential action) during speech comprehension in an eye-tracked sentence comprehension task with children (ages 3-10 years) and college-aged adults. Participants were initially familiarized with 2 talkers who identified their respective roles (e.g., PRINCESS and PIRATE) before hearing a previously introduced talker name an action and object ("I want to hold the sword," in the pirate's voice). As the sentence was spoken, eye movements were recorded to 4 objects that varied in relationship to the sentential talker and action (target: SWORD, talker-related: SHIP, action-related: WAND, and unrelated: CARRIAGE). The task was to select the named image. Even young child listeners rapidly combined inferences about talker identity with the action, allowing them to fixate on the target before it was mentioned, although there were developmental and vocabulary differences on this task. Results suggest that children, like adults, store real-world knowledge of a talker's role and actively use this information to interpret speech.
Project description:This investigation examined the effect of accent of target talkers and background speech maskers on listeners' ability to use cues to separate speech from noise. Differences in accent may create a disparity in the relative timing between signal and background, and such timing cues may be used to separate the target talker from the background speech masker. However, the use of this cue could be reduced for older listeners with temporal processing deficits, especially those with hearing loss. Participants were younger and older listeners with normal hearing and older listeners with hearing loss. Stimuli were IEEE sentences recorded in English by male native speakers of English and Spanish. These sentences were presented in different maskers that included speech-modulated noise and background babbles varying in talker gender and accent. Signal-to-noise ratios corresponding to 50% correct performance were measured. Results indicate that a pronounced Spanish accent limits a listener's ability to take advantage of cues to speech segregation and that a difference in accentedness between the target talker and background masker may be a useful cue for speech segregation. Older hearing-impaired listeners performed poorly in all conditions with the accented talkers.
Project description:Speech processing is slower and less accurate when listeners encounter speech from multiple talkers compared to one continuous talker. However, interference from multiple talkers has been investigated only using immediate speech recognition or long-term memory recognition tasks. These tasks reveal opposite effects of speech processing time on speech recognition - while fast processing of multi-talker speech impedes immediate recognition, it also results in more abstract and less talker-specific long-term memories for speech. Here, we investigated whether and how processing multi-talker speech disrupts working memory maintenance, an intermediate stage between perceptual recognition and long-term memory. In a digit sequence recall task, listeners encoded seven-digit sequences and recalled them after a 5-s delay. Sequences were spoken by either a single talker or multiple talkers at one of three presentation rates (0-, 200-, and 500-ms inter-digit intervals). Listeners' recall was slower and less accurate for sequences spoken by multiple talkers than a single talker. Especially for the fastest presentation rate, listeners were less efficient when recalling sequences spoken by multiple talkers. Our results reveal that talker-specificity effects for speech working memory are most prominent when listeners must rapidly encode speech. These results suggest that, like immediate speech recognition, working memory for speech is susceptible to interference from variability across talkers. While many studies ascribe effects of talker variability to the need to calibrate perception to talker-specific acoustics, these results are also consistent with the idea that a sudden change of talkers disrupts attentional focus, interfering with efficient working-memory processing.
Project description:Speech sounds are perceived relative to spectral properties of surrounding speech. For instance, target words that are ambiguous between /b?t/ (with low F1) and /b?t/ (with high F1) are more likely to be perceived as "bet" after a "low F1" sentence, but as "bit" after a "high F1" sentence. However, it is unclear how these spectral contrast effects (SCEs) operate in multi-talker listening conditions. Recently, Feng and Oxenham (J.Exp.Psychol.-Hum.Percept.Perform. 44(9), 1447-1457, 2018b) reported that selective attention affected SCEs to a small degree, using two simultaneously presented sentences produced by a single talker. The present study assessed the role of selective attention in more naturalistic "cocktail party" settings, with 200 lexically unique sentences, 20 target words, and different talkers. Results indicate that selective attention to one talker in one ear (while ignoring another talker in the other ear) modulates SCEs in such a way that only the spectral properties of the attended talker influences target perception. However, SCEs were much smaller in multi-talker settings (Experiment 2) than those in single-talker settings (Experiment 1). Therefore, the influence of SCEs on speech comprehension in more naturalistic settings (i.e., with competing talkers) may be smaller than estimated based on studies without competing talkers.
Project description:Purpose This preliminary investigation compared effects of time compression on intelligibility for male versus female talkers. We hypothesized that time compression would have a greater effect for female talkers. Method Sentence materials from four talkers (two males) were time compressed, and original-speed and time-compressed speech materials were presented in a background of 12-talker babble to young adult listeners with normal hearing. Each talker/processing condition was heard by eight listeners (total N = 64). Generalized linear mixed-effects models were used to determine the effects of and interaction between processing condition and talker sex on keyword intelligibility. Additional post hoc analyses examined whether processing condition effects were related to talker vowel space and word frequency. Results For original-speed sentences, female and male talkers were essentially equally intelligible. Time compression reduced intelligibility for all talkers, but the effect was significantly greater for the female talkers. Supplementary analyses revealed that the effect of time compression depended on both talker vowel space and word frequency: The detrimental effect decreased significantly as word frequency and vowel space increased. Word frequency effects were also greater overall for talkers with larger vowel spaces. Conclusions While the small talker sample limits conclusions about the effects of talker sex, the secondary analyses suggest that intelligibility of talkers with larger vowel spaces is less susceptible to the negative effect of time compression, especially for high-frequency words.