Effects of cross-language voice training on speech perception: whose familiar voices are more intelligible?
ABSTRACT: Previous research has shown that familiarity with a talker's voice can improve linguistic processing (herein, "Familiar Talker Advantage"), but this benefit is constrained by the context in which the talker's voice is familiar. The current study examined how familiarity affects intelligibility by manipulating the type of talker information available to listeners. One group of listeners learned to identify bilingual talkers' voices from English words, where they learned language-specific talker information. A second group of listeners learned the same talkers from German words, and thus only learned language-independent talker information. After voice training, both groups of listeners completed a word recognition task with English words produced by both familiar and unfamiliar talkers. Results revealed that English-trained listeners perceived more phonemes correct for familiar than unfamiliar talkers, while German-trained listeners did not show improved intelligibility for familiar talkers. The absence of a processing advantage in speech intelligibility for the German-trained listeners demonstrates limitations on the Familiar Talker Advantage, which crucially depends on the language context in which the talkers' voices were learned; knowledge of how a talker produces linguistically relevant contrasts in a particular language is necessary to increase speech intelligibility for words produced by familiar talkers.
Project description:Previous studies have shown that listeners are better able to understand speech when they are familiar with the talker's voice. In most of these studies, talker familiarity was ensured by explicit voice training; that is, listeners learned to identify the familiar talkers. In the real world, however, the characteristics of familiar talkers are learned incidentally, through communication. The present study investigated whether speech comprehension benefits from implicit voice training; that is, through exposure to talkers' voices without listeners explicitly trying to identify them. During four training sessions, listeners heard short sentences containing a single verb (e.g., "he writes"), spoken by one talker. The sentences were mixed with noise, and listeners identified the verb within each sentence while their speech-reception thresholds (SRT) were measured. In a final test session, listeners performed the same task, but this time they heard different sentences spoken by the familiar talker and three unfamiliar talkers. Familiar and unfamiliar talkers were counterbalanced across listeners. Half of the listeners performed a test session in which the four talkers were presented in separate blocks (blocked paradigm). For the other half, talkers varied randomly from trial to trial (interleaved paradigm). The results showed that listeners had lower SRT when the speech was produced by the familiar talker than the unfamiliar talkers. The type of talker presentation (blocked vs. interleaved) had no effect on this familiarity benefit. These findings suggest that listeners implicitly learn talker-specific information during a speech-comprehension task, and exploit this information to improve the comprehension of novel speech material from familiar talkers.
Project description:Listeners identify talkers more accurately when listening to their native language compared to an unfamiliar, foreign language. This language-familiarity effect in talker identification has been shown to arise from familiarity with both the sound patterns (phonetics and phonology) and the linguistic content (words) of one's native language. However, it has been unknown whether these two sources of information contribute independently to talker identification abilities, particularly whether hearing familiar words can facilitate talker identification in the absence of familiar phonetics. To isolate the contribution of lexical familiarity, we conducted three experiments that tested listeners' ability to identify talkers saying familiar words, but with unfamiliar phonetics. In two experiments, listeners identified talkers from recordings of their native language (English), an unfamiliar foreign language (Mandarin Chinese), or "hybrid" speech stimuli (sentences spoken in Mandarin, but which can be convincingly coerced to sound like English when presented with subtitles that prime plausible English-language lexical interpretations based on the Mandarin phonetics). In a third experiment, we explored natural variation in lexical-phonetic congruence as listeners identified talkers with varying degrees of a Mandarin accent. Priming listeners to hear English speech did not improve their ability to identify talkers speaking Mandarin, even after additional training, and talker identification accuracy decreased as talkers' phonetics became increasingly dissimilar to American English. Together, these experiments indicate that unfamiliar sound patterns preclude talker identification benefits otherwise afforded by familiar words. These results suggest that linguistic representations contribute hierarchically to talker identification; the facilitatory effect of familiar words requires the availability of familiar phonological forms.
Project description:Purpose This preliminary investigation compared effects of time compression on intelligibility for male versus female talkers. We hypothesized that time compression would have a greater effect for female talkers. Method Sentence materials from four talkers (two males) were time compressed, and original-speed and time-compressed speech materials were presented in a background of 12-talker babble to young adult listeners with normal hearing. Each talker/processing condition was heard by eight listeners (total N = 64). Generalized linear mixed-effects models were used to determine the effects of and interaction between processing condition and talker sex on keyword intelligibility. Additional post hoc analyses examined whether processing condition effects were related to talker vowel space and word frequency. Results For original-speed sentences, female and male talkers were essentially equally intelligible. Time compression reduced intelligibility for all talkers, but the effect was significantly greater for the female talkers. Supplementary analyses revealed that the effect of time compression depended on both talker vowel space and word frequency: The detrimental effect decreased significantly as word frequency and vowel space increased. Word frequency effects were also greater overall for talkers with larger vowel spaces. Conclusions While the small talker sample limits conclusions about the effects of talker sex, the secondary analyses suggest that intelligibility of talkers with larger vowel spaces is less susceptible to the negative effect of time compression, especially for high-frequency words.
Project description:Purpose Previous studies with children and adults have demonstrated a familiar talker advantage-better word recognition for familiar talkers. The goal of the current study was to test whether this phenomenon is modulated by a child's language ability. Method Sixty children with a range of language ability were trained to learn the voices of 3 foreign-accented, German-English bilingual talkers and received feedback about their performance. Both before and after this talker voice training, children completed a spoken word recognition task in which they heard consonant-vowel-consonant words mixed with noise that were spoken by the 3 familiarized talkers and by 3 unfamiliar German-English bilinguals. Results Two findings emerged from this study: First, children with both higher and lower language ability performed similarly on the familiarized talkers. Second, children with higher language scores performed similarly on both the familiarized and unfamiliar talkers, whereas children with lower language scores performed worse on the unfamiliar talkers compared to familiar talkers, suggesting an inability to generalize to novel, unfamiliar talkers who spoke with a similar accent. Discussion Together, these findings indicate that children with higher language scores are able to generalize knowledge about foreign-accented talkers to help spoken word recognition for novel talkers with the same accent. In contrast, children with lower language skills did not exhibit the same magnitude of generalization. This lack of generalization to similar talkers may mean that children with lower language skills are at a disadvantage in spoken language tasks because they are unable to process speech as well when listening to unfamiliar talkers.
Project description:The ability to recognize people by their voice is an important social behavior. Individuals differ in how they pronounce words, and listeners may take advantage of language-specific knowledge of speech phonology to facilitate recognizing voices. Impaired phonological processing is characteristic of dyslexia and thought to be a basis for difficulty in learning to read. We tested voice-recognition abilities of dyslexic and control listeners for voices speaking listeners' native language or an unfamiliar language. Individuals with dyslexia exhibited impaired voice-recognition abilities compared with controls only for voices speaking their native language. These results demonstrate the importance of linguistic representations for voice recognition. Humans appear to identify voices by making comparisons between talkers' pronunciations of words and listeners' stored abstract representations of the sounds in those words.
Project description:Purpose Older native speakers of English have difficulty in understanding Spanish-accented English compared to younger native English speakers. However, it is unclear if this age effect would be observed among native speakers of Spanish. The current study investigates the effects of age and native language experience with Spanish on the ability to recognize words spoken in English by Spanish-accented and unaccented talkers. Method English monosyllabic words, recorded by native speakers of English and Spanish, were presented to 4 groups of listeners with normal hearing: younger native Spanish listeners ( n = 15), older native Spanish listeners ( n = 16), younger native English listeners ( n = 15), and older native English listeners ( n = 15). Speech recognition accuracy was assessed for the unaccented and accented words in both quiet and noise. Results In all conditions, the native English listeners performed better than the native Spanish listeners. More specifically, the native speakers of Spanish consistently recognized accented English less accurately than the native speakers of English, demonstrating no advantage of shared native language experience between nonnative listeners and accented talkers. Older listeners in the native Spanish language group also performed less accurately than their younger counterparts, for English words spoken by both unaccented and accented talkers. Finally, whereas listeners who were native speakers of English showed marked declines in recognition of Spanish-accented English relative to unaccented English, listeners who were native speakers of Spanish (both younger and older) showed less decline. Conclusions The general pattern of results suggests that both native language experience in a language other than English and age limit the ability to recognize Spanish-accented English. The implication of the overall findings is that older nonnative listeners will have considerable difficulty in understanding English, regardless of the talker's accent, in both clinical and everyday listening situations.
Project description:To understand spoken words, listeners must appropriately interpret co-occurring talker characteristics and speech sound content. This ability was tested in 6- to 14-months-olds by measuring their looking to named food and body part images. In the new talker condition (n = 90), pictures were named by an unfamiliar voice; in the mispronunciation condition (n = 98), infants' mothers "mispronounced" the words (e.g., nazz for nose). Six- to 7-month-olds fixated target images above chance across conditions, understanding novel talkers, and mothers' phonologically deviant speech equally. Eleven- to 14-months-olds also understood new talkers, but performed poorly with mispronounced speech, indicating sensitivity to phonological deviation. Between these ages, performance was mixed. These findings highlight the changing roles of acoustic and phonetic variability in early word comprehension, as infants learn which variations alter meaning.
Project description:Efficient speech perception requires listeners to maintain an exquisite tension between stability of the language architecture and flexibility to accommodate variation in the input, such as that associated with individual talker differences in speech production. Achieving this tension can be guided by top-down learning mechanisms, wherein lexical information constrains interpretation of speech input, and by bottom-up learning mechanisms, in which distributional information in the speech signal is used to optimize the mapping to speech sound categories. An open question for theories of perceptual learning concerns the nature of the representations that are built for individual talkers: do these representations reflect long-term, global exposure to a talker or rather only short-term, local exposure? Recent research suggests that when lexical knowledge is used to resolve a talker's ambiguous productions, listeners disregard previous experience with a talker and instead rely on only recent experience, a finding that is contrary to predictions of Bayesian belief-updating accounts of perceptual adaptation. Here we use a distributional learning paradigm in which lexical information is not explicitly required to resolve ambiguous input to provide an additional test of global versus local exposure accounts. Listeners completed two blocks of phonetic categorization for stimuli that differed in voice-onset-time, a probabilistic cue to the voicing contrast in English stop consonants. In each block, two distributions were presented, one specifying /g/ and one specifying /k/. Across the two blocks, variance of the distributions was manipulated to be either narrow or wide. The critical manipulation was order of the two blocks; half of the listeners were first exposed to the narrow distributions followed by the wide distributions, with the order reversed for the other half of the listeners. The results showed that for earlier trials, the identification slope was steeper for the narrow-wide group compared to the wide-narrow group, but this difference was attenuated for later trials. The between-group convergence was driven by an asymmetry in learning between the two orders such that only those in the narrow-wide group showed slope movement during exposure, a pattern that was mirrored by computational simulations in which the distributional statistics of the present talker were integrated with prior experience with English. This pattern of results suggests that listeners did not disregard all prior experience with the talker, and instead used cumulative exposure to guide phonetic decisions, which raises the possibility that accommodating a talker's phonetic signature entails maintaining representations that reflect global experience.
Project description:Children seem able to efficiently interpret a variety of linguistic cues during speech comprehension, yet have difficulty interpreting sources of nonlinguistic and paralinguistic information that accompany speech. The current study asked whether (paralinguistic) voice-activated role knowledge is rapidly interpreted in coordination with a linguistic cue (a sentential action) during speech comprehension in an eye-tracked sentence comprehension task with children (ages 3-10 years) and college-aged adults. Participants were initially familiarized with 2 talkers who identified their respective roles (e.g., PRINCESS and PIRATE) before hearing a previously introduced talker name an action and object ("I want to hold the sword," in the pirate's voice). As the sentence was spoken, eye movements were recorded to 4 objects that varied in relationship to the sentential talker and action (target: SWORD, talker-related: SHIP, action-related: WAND, and unrelated: CARRIAGE). The task was to select the named image. Even young child listeners rapidly combined inferences about talker identity with the action, allowing them to fixate on the target before it was mentioned, although there were developmental and vocabulary differences on this task. Results suggest that children, like adults, store real-world knowledge of a talker's role and actively use this information to interpret speech.
Project description:The ability of native and non-native speakers to enhance intelligibility of target vowels by speaking clearly was compared across three talker groups: monolingual English speakers and native Spanish speakers with either an earlier or a later age of immersion in an English-speaking environment. Talkers produced the target syllables "bead, bid, bayed, bed, bad" and "bod" in 'conversational' and clear speech styles. The stimuli were presented to native English-speaking listeners in multi-talker babble with signal-to-noise ratios of -8 dB for the monolingual and early learners and -4 dB for the later learners. The monolinguals and early learners of English showed a similar average clear speech benefit, and the early learners showed equal or greater intelligibility than monolinguals for most target vowels. The 4-dB difference in signal-to-noise ratio yielded approximately equal average intelligibility for the monolinguals and later learners. The average clear speech benefit was smallest for the later learners, and a significant clear speech decrement was obtained for the target syllable "bid." These results suggest that later learners of English as a second language may be less able than monolinguals to accommodate listeners in noisy environments, due to a reduced ability to improve intelligibility by speaking more clearly.