Vowel acoustics in Parkinson's disease and multiple sclerosis: comparison of clear, loud, and slow speaking conditions.
ABSTRACT: The impact of clear speech, increased vocal intensity, and rate reduction on acoustic characteristics of vowels was compared in speakers with Parkinson's disease (PD), speakers with multiple sclerosis (MS), and healthy controls.Speakers read sentences in habitual, clear, loud, and slow conditions. Variations in clarity, intensity, and rate were stimulated using magnitude production. Formant frequency values for peripheral and nonperipheral vowels were obtained at 20%, 50%, and 80% of vowel duration to derive static and dynamic acoustic measures. Intensity and duration measures were obtained.Rate was maximally reduced in the slow condition, and vocal intensity was maximized in the loud condition. The clear condition also yielded a reduced articulatory rate and increased intensity, although less than for the slow or loud conditions. Overall, the clear condition had the most consistent impact on vowel spectral characteristics. Spectral and temporal distinctiveness for peripheral-nonperipheral vowel pairs was largely similar across conditions.Clear speech maximized peripheral and nonperipheral vowel space areas for speakers with PD and MS while also reducing rate and increasing vocal intensity. These results suggest that a speech style focused on increasing articulatory amplitude yields the most robust changes in vowel segmental articulation.
Project description:A central challenge for articulatory speech synthesis is the simulation of realistic articulatory movements, which is critical for the generation of highly natural and intelligible speech. This includes modeling coarticulation, i.e., the context-dependent variation of the articulatory and acoustic realization of phonemes, especially of consonants. Here we propose a method to simulate the context-sensitive articulation of consonants in consonant-vowel syllables. To achieve this, the vocal tract target shape of a consonant in the context of a given vowel is derived as the weighted average of three measured and acoustically-optimized reference vocal tract shapes for that consonant in the context of the corner vowels /a/, /i/, and /u/. The weights are determined by mapping the target shape of the given context vowel into the vowel subspace spanned by the corner vowels. The model was applied for the synthesis of consonant-vowel syllables with the consonants /b/, /d/, /g/, /l/, /r/, /m/, /n/ in all combinations with the eight long German vowels. In a perception test, the mean recognition rate for the consonants in the isolated syllables was 82.4%. This demonstrates the potential of the approach for highly intelligible articulatory speech synthesis.
Project description:Auditory speech perception enables listeners to access phonological categories from speech sounds. During speech production and speech motor learning, speakers' experience matched auditory and somatosensory input. Accordingly, access to phonetic units might also be provided by somatosensory information. The present study assessed whether humans can identify vowels using somatosensory feedback, without auditory feedback. A tongue-positioning task was used in which participants were required to achieve different tongue postures within the /e, ?, a/ articulatory range, in a procedure that was totally nonspeech like, involving distorted visual feedback of tongue shape. Tongue postures were measured using electromagnetic articulography. At the end of each tongue-positioning trial, subjects were required to whisper the corresponding vocal tract configuration with masked auditory feedback and to identify the vowel associated with the reached tongue posture. Masked auditory feedback ensured that vowel categorization was based on somatosensory feedback rather than auditory feedback. A separate group of subjects was required to auditorily classify the whispered sounds. In addition, we modeled the link between vowel categories and tongue postures in normal speech production with a Bayesian classifier based on the tongue postures recorded from the same speakers for several repetitions of the /e, ?, a/ vowels during a separate speech production task. Overall, our results indicate that vowel categorization is possible with somatosensory feedback alone, with an accuracy that is similar to the accuracy of the auditory perception of whispered sounds, and in congruence with normal speech articulation, as accounted for by the Bayesian classifier.
Project description:When addressing their young infants, parents systematically modify their speech. Such infant-directed speech (IDS) contains exaggerated vowel formants, which have been proposed to foster language development via articulation of more distinct speech sounds. Here, this assumption is rigorously tested using both acoustic and, for the first time, fine-grained articulatory measures. Mothers were recorded speaking to their infant and to another adult, and measures were taken of their acoustic vowel space, their tongue and lip movements and the length of their vocal tract. Results showed that infant- but not adult-directed speech contains acoustically exaggerated vowels, and these are not the product of adjustments to tongue or to lip movements. Rather, they are the product of a shortened vocal tract due to a raised larynx, which can be ascribed to speakers' unconscious effort to appear smaller and more non-threatening to the young infant. This adjustment in IDS may be a vestige of early mother-infant interactions, which had as its primary purpose the transmission of non-aggressiveness and/or a primitive manifestation of pre-linguistic vocal social convergence of the mother to her infant. With the advent of human language, this vestige then acquired a secondary purpose-facilitating language acquisition via the serendipitously exaggerated vowels.
Project description:The perceptual consequences of rate reduction, increased vocal intensity, and clear speech were studied in speakers with multiple sclerosis (MS), Parkinson's disease (PD), and healthy controls.Seventy-eight speakers read sentences in habitual, clear, loud, and slow conditions. Sentences were equated for peak amplitude and mixed with multitalker babble for presentation to listeners. Using a computerized visual analog scale, listeners judged intelligibility or speech severity as operationally defined in Sussman and Tjaden (2012).Loud and clear but not slow conditions improved intelligibility relative to the habitual condition. With the exception of the loud condition for the PD group, speech severity did not improve above habitual and was reduced relative to habitual in some instances. Intelligibility and speech severity were strongly related, but relationships for disordered speakers were weaker in clear and slow conditions versus habitual.Both clear and loud speech show promise for improving intelligibility and maintaining or improving speech severity in multitalker babble for speakers with mild dysarthria secondary to MS or PD, at least as these perceptual constructs were defined and measured in this study. Although scaled intelligibility and speech severity overlap, the metrics further appear to have some separate value in documenting treatment-related speech changes.
Project description:In this paper, a quantitative study of acoustic-to-articulatory inversion for vowel speech sounds by analysis-by-synthesis using the Maeda articulatory model is performed. For chain matrix calculation of vocal tract (VT) acoustics, the chain matrix derivatives with respect to area function are calculated and used in a quasi-Newton method for optimizing articulatory trajectories. The cost function includes a distance measure between natural and synthesized first three formants, and parameter regularization and continuity terms. Calibration of the Maeda model to two speakers, one male and one female, from the University of Wisconsin x-ray microbeam (XRMB) database, using a cost function, is discussed. Model adaptation includes scaling the overall VT and the pharyngeal region and modifying the outer VT outline using measured palate and pharyngeal traces. The inversion optimization is initialized by a fast search of an articulatory codebook, which was pruned using XRMB data to improve inversion results. Good agreement between estimated midsagittal VT outlines and measured XRMB tongue pellet positions was achieved for several vowels and diphthongs for the male speaker, with average pellet-VT outline distances around 0.15 cm, smooth articulatory trajectories, and less than 1% average error in the first three formants.
Project description:Human speech sounds are produced through a coordinated movement of structures along the vocal tract. Here we show highly structured neuronal encoding of vowel articulation. In medial-frontal neurons, we observe highly specific tuning to individual vowels, whereas superior temporal gyrus neurons have nonspecific, sinusoidally modulated tuning (analogous to motor cortical directional tuning). At the neuronal population level, a decoding analysis reveals that the underlying structure of vowel encoding reflects the anatomical basis of articulatory movements. This structured encoding enables accurate decoding of volitional speech segments and could be applied in the development of brain-machine interfaces for restoring speech in paralysed individuals.
Project description:PURPOSE:Infectious agents, such as SARS-CoV-2, can be carried by droplets expelled during breathing. The spatial dissemination of droplets varies according to their initial velocity. After a short literature review, our goal was to determine the velocity of the exhaled air during vocal exercises. METHODS:A propylene glycol cloud produced by 2 e-cigarettes' users allowed visualization of the exhaled air emitted during vocal exercises. Airflow velocities were measured during the first 200 ms of a long exhalation, a sustained vowel /a/ and varied vocal exercises. For the long exhalation and the sustained vowel /a/, the decrease of airflow velocity was measured until 3 s. Results were compared with a Computational Fluid Dynamics (CFD) study using boundary conditions consistent with our experimental study. RESULTS:Regarding the production of vowels, higher velocities were found in loud and whispered voices than in normal voice. Voiced consonants like /?/ or /v/ generated higher velocities than vowels. Some voiceless consonants, e.g., /t/ generated high velocities, but long exhalation had the highest velocities. Semi-occluded vocal tract exercises generated faster airflow velocities than loud speech, with a decreased velocity during voicing. The initial velocity quickly decreased as was shown during a long exhalation or a sustained vowel /a/. Velocities were consistent with the CFD data. CONCLUSION:Initial velocity of the exhaled air is a key factor influencing droplets trajectory. Our study revealed that vocal exercises produce a slower airflow than long exhalation. Speech therapy should, therefore, not be associated with an increased risk of contamination when implementing standard recommendations.
Project description:PURPOSE:This study aimed to evaluate the role of motor control immaturity in the speech production characteristics of 4-year-old children, compared to adults. Specifically, two indices were examined: trial-to-trial variability, which is assumed to be linked to motor control accuracy, and anticipatory extra-syllabic vowel-to-vowel coarticulation, which is assumed to be linked to the comprehensiveness, maturity and efficiency of sensorimotor representations in the central nervous system. METHOD:Acoustic and articulatory (ultrasound) data were recorded for 20 children and 10 adults, all native speakers of Canadian French, during the production of isolated vowels and vowel-consonant-vowel (V1-C-V2) sequences. Trial-to-trial variability was measured in isolated vowels. Extra-syllabic anticipatory coarticulation was assessed in V1-C-V2 sequences by measuring the patterns of variability of V1 associated with variations in V2. Acoustic data were reported for all subjects and articulatory data, for a subset of 6 children and 2 adults. RESULTS:Trial-to-trial variability was significantly larger in children. Systematic and significant anticipation of V2 in V1 was always found in adults, but was rare in children. Significant anticipation was observed in children only when V1 was /a/, and only along the antero-posterior dimension, with a much smaller magnitude than in adults. A closer analysis of individual speakers revealed that some children showed adult-like anticipation along this dimension, whereas the majority did not. CONCLUSION:The larger trial-to-trial variability and the lack of anticipatory behavior in most children-two phenomena that have been observed in several non-speech motor tasks-support the hypothesis that motor control immaturity may explain a large part of the differences observed between speech production in adults and 4-year-old children, apart from other causes that may be linked with language development.
Project description:This study investigated how different instructions for eliciting clear speech affected selected acoustic measures of speech.Twelve speakers were audio-recorded reading 18 different sentences from the Assessment of Intelligibility of Dysarthric Speech ( Yorkston & Beukelman, 1984). Sentences were produced in habitual, clear, hearing impaired, and overenunciate conditions. A variety of acoustic measures were obtained.Relative to habitual, the clear, hearing impaired, and overenunciate conditions were associated with different magnitudes of acoustic change for measures of vowel production, speech timing, and vocal intensity. The overenunciate condition tended to yield the greatest magnitude of change in vowel spectral measures and speech timing, followed by the hearing impaired and clear conditions. SPL tended to be the greatest in the hearing impaired condition for half of the speakers studied.Different instructions for eliciting clear speech yielded acoustic adjustments of varying magnitude. Results have implications for direct comparison of studies using different instructions for eliciting clear speech. Results also have implications for optimizing clear speech training programs.
Project description:Vowel reduction is a prominent feature of American English, as well as other stress-timed languages. As a phonological process, vowel reduction neutralizes multiple vowel quality contrasts in unstressed syllables. For bilinguals whose native language is not characterized by large spectral and durational differences between tonic and atonic vowels, systematically reducing unstressed vowels to the central vowel space can be problematic. Failure to maintain this pattern of stressed-unstressed syllables in American English is one key element that contributes to a "foreign accent" in second language speakers. Reduced vowels, or "schwas," have also been identified as particularly vulnerable to the co-articulatory effects of adjacent consonants. The current study examined the effects of adjacent sounds on the spectral and temporal qualities of schwa in word-final position. Three groups of English-speaking adults were tested: Miami-based monolingual English speakers, early Spanish-English bilinguals, and late Spanish-English bilinguals. Subjects performed a reading task to examine their schwa productions in fluent speech when schwas were preceded by consonants from various points of articulation. Results indicated that monolingual English and late Spanish-English bilingual groups produced targeted vowel qualities for schwa, whereas early Spanish-English bilinguals lacked homogeneity in their vowel productions. This extends prior claims that schwa is targetless for F2 position for native speakers to highly-proficient bilingual speakers. Though spectral qualities lacked homogeneity for early Spanish-English bilinguals, early bilinguals produced schwas with near native-like vowel duration. In contrast, late bilinguals produced schwas with significantly longer durations than English monolinguals or early Spanish-English bilinguals. Our results suggest that the temporal properties of a language are better integrated into second language phonologies than spectral qualities. Finally, we examined the role of nonstructural variables (e.g. linguistic history measures) in predicting native-like vowel duration. These factors included: Age of L2 learning, amount of L1 use, and self-reported bilingual dominance. Our results suggested that different sociolinguistic factors predicted native-like reduced vowel duration than predicted native-like vowel qualities across multiple phonetic environments.