Development and validation of the Mandarin speech perception test.
Ontology highlight
ABSTRACT: Currently there are few standardized speech testing materials for Mandarin-speaking cochlear implant (CI) listeners. In this study, Mandarin speech perception (MSP) sentence test materials were developed and validated in normal-hearing subjects listening to acoustic simulations of CI processing. Percent distribution of vowels, consonants, and tones within each MSP sentence list was similar to that observed across commonly used Chinese characters. There was no significant difference in sentence recognition across sentence lists. Given the phonetic balancing within lists and the validation with spectrally degraded speech, the present MSP test materials may be useful for assessing speech performance of Mandarin-speaking CI listeners.
Project description:Although children develop categorical speech perception at a very young age, the maturation process remains unclear. A cross-sectional study in Mandarin-speaking 4-, 6-, and 10-year-old children, 14-year-old adolescents, and adults (n = 104, 56 males, all Asians from mainland China) was conducted to investigate the development of categorical perception of four Mandarin phonemic contrasts: lexical tone contrast Tone 1-2, vowel contrast /u/-/i/, consonant aspiration contrast /p/-/ph /, and consonant formant transition contrast /p/-/t/. The results indicated that different types of phonemic contrasts, and even the identification and discrimination of the same phonemic contrast, matured asynchronously. The observation that tone and vowel perception are achieved earlier than consonant perception supports the phonological saliency hypothesis.
Project description:A new sentence recognition test in Mandarin Chinese was developed and validated following the principles and procedures of development of the English AzBio sentence materials. The study was conducted in two stages. In the first stage, 1,020 sentences spoken by 4 talkers (2 males and 2 females) were processed through a 5-channel noise vocoder and presented to 17 normal-hearing Mandarin-speaking adults for recognition. A total of 600 sentences (150 from each talker) in the range of approximately 62 to 92% correct (mean = 78.0% correct) were subsequently selected to compile 30, 20-sentence lists. In the second stage, 30 adult CI users were recruited to verify the list equivalency. A repeated-measures analysis of variance followed by the post hoc Tukey's test revealed that 26 of the 30 lists were equivalent. Finally, a binomial distribution model was adopted to account for the inherent variability in the lists. It was found that the inter-list variability could be best accounted for with a 65-item binomial distribution model. The lower and upper limits of the 95% critical differences for one- and two-list recognition scores were then generated to provide guidance for detection of a significant difference in recognition scores in clinical settings. The final set of 26 equivalent lists contains sentence materials more difficult than those found in other speech audiometry materials in Mandarin Chinese. This test should help minimize the ceiling effects when testing sentence recognition in Mandarin-speaking CI users.
Project description:Tonal languages make use of pitch variation for distinguishing lexical semantics, and their melodic richness seems comparable to that of music. The present study investigated a novel priming effect of melody on the pitch processing of Mandarin speech. When a spoken Mandarin utterance is preceded by a musical melody, which mimics the melody of the utterance, the listener is likely to perceive this utterance as song. We used functional magnetic resonance imaging to examine the neural substrates of this speech-to-song transformation. Pitch contours of spoken utterances were modified so that these utterances can be perceived as either speech or song. When modified speech (target) was preceded by a musical melody (prime) that mimics the speech melody, a task of judging the melodic similarity between the target and prime was associated with increased activity in the inferior frontal gyrus (IFG) and superior/middle temporal gyrus (STG/MTG) during target perception. We suggest that the pars triangularis of the right IFG may allocate attentional resources to the multi-modal processing of speech melody, and the STG/MTG may integrate the phonological and musical (melodic) information of this stimulus. These results are discussed in relation to subvocal rehearsal, a speech-to-song illusion, and song perception.
Project description:Objective: This paper reviewed the literature on the development of and factors affecting speech perception of Mandarin-speaking children with cochlear implantation (CI). We also summarized speech outcome measures in standard Mandarin for evaluating auditory and speech perception of children with CI. Method: A comprehensive search of Google Scholar and PubMed was conducted from March to June 2021. Search terms used were speech perception/lexical tone recognition/auditory perception AND cochlear implant AND Mandarin/Chinese. Conclusion: Unilateral CI recipients demonstrated continuous improvements in auditory and speech perception for several years post-activation. Younger age at implantation and longer duration of CI use contribute to better speech perception. Having undergone a hearing aid trial before implantation and having caregivers whose educational level is higher may lead to better performance. While the findings that support the use of CI to improve speech perception continue to grow, much research is needed to validate the use of unilateral and bilateral implantation. Evidence to date, however, revealed bimodal benefits over CI-only conditions in lexical tone recognition and sentence perception in noise. Due to scarcity of research, conclusions on the benefits of bilateral CIs compared to unilateral CI or bimodal CI use cannot be drawn. Therefore, future research on bimodal and bilateral CIs is needed to guide evidence-based clinical practice.
Project description:Drug addiction can cause severe damage to the human brain, leading to significant problems in cognitive processing, such as irritability, speech distortions, and exaggeration of negative stimuli. Speech plays a fundamental role in social interaction, including both the production and perception. The ability to perceive communicative functions conveyed through speech is crucial for successful interpersonal communication and the maintaining good social relationships. However, due to the limited number of previous studies, it remains unclear whether the cognitive disorder caused by drug addiction affects the perception of communicative function conveyed in Mandarin speech. To address this question, we conducted a perception experiment involving sixty male participants, including 25 heroin addicts and 35 healthy controls. The experiment aimed to examine the perception of three communicative functions (i.e., statement, interrogative, and imperative) under three background noise conditions (i.e., no noise, SNR [Signal to Noise Ratio] = 10, and SNR = 0). Eight target sentences were first recorded by two native Mandarin speakers for each of the three communicative functions. Each half was then combined with Gaussian White Noise under two background noise conditions (i.e., SNR = 10 and SNR = 0). Finally, 48 speech stimuli were included in the experiment with four options provided for perceptual judgment. The results showed that, under the three noise conditions, the average perceptual accuracies of the three communicative functions were 80.66% and 38% for the control group and the heroin addicts, respectively. Significant differences were found in the perception of the three communicative functions between the control group and the heroin addicts under the three noise conditions, except for the recognition of imperative under strong noise condition (i.e., SNR = 0). Moreover, heroin addicts showed good accuracy (around 50%) in recognizing imperative and poor accuracy (i.e., lower than the chance level) in recognizing interrogative. This paper not only fills the research gap in the perception of communicative functions in Mandarin speech among drug addicts but also enhances the understanding of the effects of drugs on speech perception and provides a foundation for the speech rehabilitation of drug addicts.
Project description:Musical training confers advantages in speech-sound processing, which could play an important role in early childhood education. To understand the mechanisms of this effect, we used event-related potential and behavioral measures in a longitudinal design. Seventy-four Mandarin-speaking children aged 4-5 y old were pseudorandomly assigned to piano training, reading training, or a no-contact control group. Six months of piano training improved behavioral auditory word discrimination in general as well as word discrimination based on vowels compared with the controls. The reading group yielded similar trends. However, the piano group demonstrated unique advantages over the reading and control groups in consonant-based word discrimination and in enhanced positive mismatch responses (pMMRs) to lexical tone and musical pitch changes. The improved word discrimination based on consonants correlated with the enhancements in musical pitch pMMRs among the children in the piano group. In contrast, all three groups improved equally on general cognitive measures, including tests of IQ, working memory, and attention. The results suggest strengthened common sound processing across domains as an important mechanism underlying the benefits of musical training on language processing. In addition, although we failed to find far-transfer effects of musical training to general cognition, the near-transfer effects to speech perception establish the potential for musical training to help children improve their language skills. Piano training was not inferior to reading training on direct tests of language function, and it even seemed superior to reading training in enhancing consonant discrimination.
Project description:Infant directed speech (IDS) is a speech register characterized by simpler sentences, a slower rate, and more variable prosody. Recent work has implicated it in more subtle aspects of language development. Kuhl et al. (1997) demonstrated that segmental cues for vowels are affected by IDS in a way that may enhance development: the average locations of the extreme "point" vowels (/a/, /i/ and /u/) are further apart in acoustic space. If infants learn speech categories, in part, from the statistical distributions of such cues, these changes may specifically enhance speech category learning. We revisited this by asking (1) if these findings extend to a new cue (Voice Onset Time, a cue for voicing); (2) whether they extend to the interior vowels which are much harder to learn and/or discriminate; and (3) whether these changes may be an unintended phonetic consequence of factors like speaking rate or prosodic changes associated with IDS. Eighteen caregivers were recorded reading a picture book including minimal pairs for voicing (e.g., beach/peach) and a variety of vowels to either an adult or their infant. Acoustic measurements suggested that VOT was different in IDS, but not in a way that necessarily supports better development, and that these changes are almost entirely due to slower rate of speech of IDS. Measurements of the vowel suggested that in addition to changes in the mean, there was also an increase in variance, and statistical modeling suggests that this may counteract the benefit of any expansion of the vowel space. As a whole this suggests that changes in segmental cues associated with IDS may be an unintended by-product of the slower rate of speech and different prosodic structure, and do not necessarily derive from a motivation to enhance development.
Project description:Harmonic and temporal fine structure (TFS) information are important cues for speech perception in noise and music perception. However, due to the inherently coarse spectral and temporal resolution in electric hearing, the question of how to deliver harmonic and TFS information to cochlear implant (CI) users remains unresolved. A harmonic-single-sideband-encoder [(HSSE); Nie et al. (2008). Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing; Lie et al., (2010). Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing] strategy has been proposed that explicitly tracks the harmonics in speech and transforms them into modulators conveying both amplitude modulation and fundamental frequency information. For unvoiced speech, HSSE transforms the TFS into a slowly varying yet still noise-like signal. To investigate its potential, four- and eight-channel vocoder simulations of HSSE and the continuous-interleaved-sampling (CIS) strategy were implemented, respectively. Using these vocoders, five normal-hearing subjects' speech recognition performance was evaluated under different masking conditions; another five normal-hearing subjects' Mandarin tone identification performance was also evaluated. Additionally, the neural discharge patterns evoked by HSSE- and CIS-encoded Mandarin tone stimuli were simulated using an auditory nerve model. All subjects scored significantly higher with HSSE than with CIS vocoders. The modeling analysis demonstrated that HSSE can convey temporal pitch cues better than CIS. Overall, the results suggest that HSSE is a promising strategy to enhance speech perception with CIs.
Project description:This study examined whether language specific properties may lead to cross-language differences in the degree of phonetic reduction. Rates of syllabic reduction (defined here as reduction in which the number of syllables pronounced is less than expected based on canonical form) in English and Mandarin were compared. The rate of syllabic reduction was higher in Mandarin than English. Regardless of language, open syllables participated in reduction more often than closed syllables. The prevalence of open syllables was higher in Mandarin than English, and this phonotactic difference could account for Mandarin's higher rate of syllabic reduction.