Young Infants' Word Comprehension Given An Unfamiliar Talker or Altered Pronunciations.
ABSTRACT: To understand spoken words, listeners must appropriately interpret co-occurring talker characteristics and speech sound content. This ability was tested in 6- to 14-months-olds by measuring their looking to named food and body part images. In the new talker condition (n = 90), pictures were named by an unfamiliar voice; in the mispronunciation condition (n = 98), infants' mothers "mispronounced" the words (e.g., nazz for nose). Six- to 7-month-olds fixated target images above chance across conditions, understanding novel talkers, and mothers' phonologically deviant speech equally. Eleven- to 14-months-olds also understood new talkers, but performed poorly with mispronounced speech, indicating sensitivity to phonological deviation. Between these ages, performance was mixed. These findings highlight the changing roles of acoustic and phonetic variability in early word comprehension, as infants learn which variations alter meaning.
Project description:Previous research has shown that familiarity with a talker's voice can improve linguistic processing (herein, "Familiar Talker Advantage"), but this benefit is constrained by the context in which the talker's voice is familiar. The current study examined how familiarity affects intelligibility by manipulating the type of talker information available to listeners. One group of listeners learned to identify bilingual talkers' voices from English words, where they learned language-specific talker information. A second group of listeners learned the same talkers from German words, and thus only learned language-independent talker information. After voice training, both groups of listeners completed a word recognition task with English words produced by both familiar and unfamiliar talkers. Results revealed that English-trained listeners perceived more phonemes correct for familiar than unfamiliar talkers, while German-trained listeners did not show improved intelligibility for familiar talkers. The absence of a processing advantage in speech intelligibility for the German-trained listeners demonstrates limitations on the Familiar Talker Advantage, which crucially depends on the language context in which the talkers' voices were learned; knowledge of how a talker produces linguistically relevant contrasts in a particular language is necessary to increase speech intelligibility for words produced by familiar talkers.
Project description:Listeners identify talkers more accurately when listening to their native language compared to an unfamiliar, foreign language. This language-familiarity effect in talker identification has been shown to arise from familiarity with both the sound patterns (phonetics and phonology) and the linguistic content (words) of one's native language. However, it has been unknown whether these two sources of information contribute independently to talker identification abilities, particularly whether hearing familiar words can facilitate talker identification in the absence of familiar phonetics. To isolate the contribution of lexical familiarity, we conducted three experiments that tested listeners' ability to identify talkers saying familiar words, but with unfamiliar phonetics. In two experiments, listeners identified talkers from recordings of their native language (English), an unfamiliar foreign language (Mandarin Chinese), or "hybrid" speech stimuli (sentences spoken in Mandarin, but which can be convincingly coerced to sound like English when presented with subtitles that prime plausible English-language lexical interpretations based on the Mandarin phonetics). In a third experiment, we explored natural variation in lexical-phonetic congruence as listeners identified talkers with varying degrees of a Mandarin accent. Priming listeners to hear English speech did not improve their ability to identify talkers speaking Mandarin, even after additional training, and talker identification accuracy decreased as talkers' phonetics became increasingly dissimilar to American English. Together, these experiments indicate that unfamiliar sound patterns preclude talker identification benefits otherwise afforded by familiar words. These results suggest that linguistic representations contribute hierarchically to talker identification; the facilitatory effect of familiar words requires the availability of familiar phonological forms.
Project description:Speech sounds are perceived relative to spectral properties of surrounding speech. For instance, target words that are ambiguous between /b?t/ (with low F1) and /b?t/ (with high F1) are more likely to be perceived as "bet" after a "low F1" sentence, but as "bit" after a "high F1" sentence. However, it is unclear how these spectral contrast effects (SCEs) operate in multi-talker listening conditions. Recently, Feng and Oxenham (J.Exp.Psychol.-Hum.Percept.Perform. 44(9), 1447-1457, 2018b) reported that selective attention affected SCEs to a small degree, using two simultaneously presented sentences produced by a single talker. The present study assessed the role of selective attention in more naturalistic "cocktail party" settings, with 200 lexically unique sentences, 20 target words, and different talkers. Results indicate that selective attention to one talker in one ear (while ignoring another talker in the other ear) modulates SCEs in such a way that only the spectral properties of the attended talker influences target perception. However, SCEs were much smaller in multi-talker settings (Experiment 2) than those in single-talker settings (Experiment 1). Therefore, the influence of SCEs on speech comprehension in more naturalistic settings (i.e., with competing talkers) may be smaller than estimated based on studies without competing talkers.
Project description:Purpose This preliminary investigation compared effects of time compression on intelligibility for male versus female talkers. We hypothesized that time compression would have a greater effect for female talkers. Method Sentence materials from four talkers (two males) were time compressed, and original-speed and time-compressed speech materials were presented in a background of 12-talker babble to young adult listeners with normal hearing. Each talker/processing condition was heard by eight listeners (total N = 64). Generalized linear mixed-effects models were used to determine the effects of and interaction between processing condition and talker sex on keyword intelligibility. Additional post hoc analyses examined whether processing condition effects were related to talker vowel space and word frequency. Results For original-speed sentences, female and male talkers were essentially equally intelligible. Time compression reduced intelligibility for all talkers, but the effect was significantly greater for the female talkers. Supplementary analyses revealed that the effect of time compression depended on both talker vowel space and word frequency: The detrimental effect decreased significantly as word frequency and vowel space increased. Word frequency effects were also greater overall for talkers with larger vowel spaces. Conclusions While the small talker sample limits conclusions about the effects of talker sex, the secondary analyses suggest that intelligibility of talkers with larger vowel spaces is less susceptible to the negative effect of time compression, especially for high-frequency words.
Project description:Speech processing is slower and less accurate when listeners encounter speech from multiple talkers compared to one continuous talker. However, interference from multiple talkers has been investigated only using immediate speech recognition or long-term memory recognition tasks. These tasks reveal opposite effects of speech processing time on speech recognition - while fast processing of multi-talker speech impedes immediate recognition, it also results in more abstract and less talker-specific long-term memories for speech. Here, we investigated whether and how processing multi-talker speech disrupts working memory maintenance, an intermediate stage between perceptual recognition and long-term memory. In a digit sequence recall task, listeners encoded seven-digit sequences and recalled them after a 5-s delay. Sequences were spoken by either a single talker or multiple talkers at one of three presentation rates (0-, 200-, and 500-ms inter-digit intervals). Listeners' recall was slower and less accurate for sequences spoken by multiple talkers than a single talker. Especially for the fastest presentation rate, listeners were less efficient when recalling sequences spoken by multiple talkers. Our results reveal that talker-specificity effects for speech working memory are most prominent when listeners must rapidly encode speech. These results suggest that, like immediate speech recognition, working memory for speech is susceptible to interference from variability across talkers. While many studies ascribe effects of talker variability to the need to calibrate perception to talker-specific acoustics, these results are also consistent with the idea that a sudden change of talkers disrupts attentional focus, interfering with efficient working-memory processing.
Project description:Previous studies have shown that listeners are better able to understand speech when they are familiar with the talker's voice. In most of these studies, talker familiarity was ensured by explicit voice training; that is, listeners learned to identify the familiar talkers. In the real world, however, the characteristics of familiar talkers are learned incidentally, through communication. The present study investigated whether speech comprehension benefits from implicit voice training; that is, through exposure to talkers' voices without listeners explicitly trying to identify them. During four training sessions, listeners heard short sentences containing a single verb (e.g., "he writes"), spoken by one talker. The sentences were mixed with noise, and listeners identified the verb within each sentence while their speech-reception thresholds (SRT) were measured. In a final test session, listeners performed the same task, but this time they heard different sentences spoken by the familiar talker and three unfamiliar talkers. Familiar and unfamiliar talkers were counterbalanced across listeners. Half of the listeners performed a test session in which the four talkers were presented in separate blocks (blocked paradigm). For the other half, talkers varied randomly from trial to trial (interleaved paradigm). The results showed that listeners had lower SRT when the speech was produced by the familiar talker than the unfamiliar talkers. The type of talker presentation (blocked vs. interleaved) had no effect on this familiarity benefit. These findings suggest that listeners implicitly learn talker-specific information during a speech-comprehension task, and exploit this information to improve the comprehension of novel speech material from familiar talkers.
Project description:Perceptual recalibration allows listeners to adapt to talker-specific pronunciations, such as atypical realizations of specific sounds. Such recalibration can facilitate robust speech recognition. However, indiscriminate recalibration following any atypically pronounced words also risks interpreting pronunciations as characteristic of a talker that are in reality because of incidental, short-lived factors (such as a speech error). We investigate whether the mechanisms underlying perceptual recalibration involve inferences about the causes for unexpected pronunciations. In 5 experiments, we ask whether perceptual recalibration is blocked if the atypical pronunciations of an unfamiliar talker can also be attributed to other incidental causes. We investigated 3 type of incidental causes for atypical pronunciations: the talker is intoxicated, the talker speaks unusually fast, or the atypical pronunciations occur only in the context of tongue twisters. In all 5 experiments, we find robust evidence for perceptual recalibration, but little evidence that the presence of incidental causes block perceptual recalibration. We discuss these results in light of other recent findings that incidental causes can block perceptual recalibration. (PsycINFO Database Record (c) 2019 APA, all rights reserved).
Project description:We investigated the perceptual development of lexical tones in native tone-learning infants during the first 2 years of life, focusing on two important stages of phonological acquisition: the preverbal and vocabulary explosion stages. Experiment 1 examined monolingual Mandarin-Chinese-learning 4- to 13-month-olds' discrimination of similar lexical tones in Mandarin, Tone 2 (T2, rising) vs. Tone 3 (T3, low-dipping). Infants were habituated to exemplars of one tone (either T2 or T3), and tested with new exemplars of the habituated tone vs. the contrasting tone. Results show that looking time increased for the contrasting tone, but not for new exemplars of the habituated tone, suggesting that infants discriminated the two tones as separate categories. Furthermore, infants' discrimination of the tones was comparable across ages. Experiment 2 tested whether tones are distinguished in toddlers' lexicon. Monolingual Mandarin-learning 19- to 26-month-olds were presented with pairs of objects while one was named. Targets were familiar words bearing T2 or T3, either correctly pronounced (CP) or mispronounced (MP) in tone. We found that word recognition was equally successful in CP and in MP trials when T2 was mispronounced as T3 and T3 as T2, indicating that T2 and T3 are confusable. In contrast, recognition failed when T2 and T3 words were mispronounced as Tone 4 (T4, falling), showing that T4 was represented as a distinct category. Results show that toddlers have difficulty encoding similar tones distinctly in known words. The T2-T3 contrast is particularly challenging because of Tone 3 Sandhi, which changes T3 to T2 when it precedes another T3. At the stage when toddlers track the meaning of T2 and T3 words and track the sandhi alternations, they seem to overgeneralize the two tones as variants of one functional category, reflecting perceptual organization at the level of phonemic learning.
Project description:Artificial talkers and speech synthesis systems have long been used as a means of understanding both speech production and speech perception. The development of an airway modulation model is described that simulates the time-varying changes of the glottis and vocal tract, as well as acoustic wave propagation, during speech production. The result is a type of artificial talker that can be used to study various aspects of how sound is generated by humans and how that sound is perceived by a listener. The primary components of the model are introduced and simulation of words and phrases are demonstrated.
Project description:A corpus of stimuli has been collected to support the use of common materials across research laboratories to examine school-aged children's word recognition in speech maskers. The corpus includes (1) 773 monosyllabic words that are known to be in the lexicon of 5- and 6-year-olds and (2) seven masker passages that are based on a first-grade child's writing samples. Materials were recorded by a total of 13 talkers (8 women; 5 men). All talkers recorded two masker passages; 3 talkers (2 women; 1 man) also recorded the target words. The annotated corpus is freely available online for research purposes.