Functional Characterization of the Human Speech Articulation Network.
ABSTRACT: A number of brain regions have been implicated in articulation, but their precise computations remain debated. Using functional magnetic resonance imaging, we examine the degree of functional specificity of articulation-responsive brain regions to constrain hypotheses about their contributions to speech production. We find that articulation-responsive regions (1) are sensitive to articulatory complexity, but (2) are largely nonoverlapping with nearby domain-general regions that support diverse goal-directed behaviors. Furthermore, premotor articulation regions show selectivity for speech production over some related tasks (respiration control), but not others (nonspeech oral-motor [NSO] movements). This overlap between speech and nonspeech movements concords with electrocorticographic evidence that these regions encode articulators and their states, and with patient evidence whereby articulatory deficits are often accompanied by oral-motor deficits. In contrast, the superior temporal regions show strong selectivity for articulation relative to nonspeech movements, suggesting that these regions play a specific role in speech planning/production. Finally, articulation-responsive portions of posterior inferior frontal gyrus show some selectivity for articulation, in line with the hypothesis that this region prepares an articulatory code that is passed to the premotor cortex. Taken together, these results inform the architecture of the human articulation system.
Project description:The contribution of insular cortex to speech production remains unclear and controversial given diverse findings from functional neuroimaging and lesional data. To create a precise spatiotemporal map of insular activity, we performed a series of experiments: single-word articulations of varying complexity, non-speech orofacial movements and speech listening, in a cohort of 27 patients implanted with penetrating intracranial electrodes. The posterior insula was robustly active bilaterally, but after the onset of articulation, during listening to speech and during production of non-speech mouth movements. Preceding articulation there was very sparse activity, localized primarily to the frontal operculum rather than the insula. Posterior insular was active coincident with superior temporal gyrus but was more active for self-generated speech than external speech, the opposite of the superior temporal gyrus. These findings support the conclusion that the insula does not serve pre-articulatory preparatory roles.
Project description:When speaking, we dynamically coordinate movements of our jaw, tongue, lips, and larynx. To investigate the neural mechanisms underlying articulation, we used direct cortical recordings from human sensorimotor cortex while participants spoke natural sentences that included sounds spanning the entire English phonetic inventory. We used deep neural networks to infer speakers' articulator movements from produced speech acoustics. Individual electrodes encoded a diversity of articulatory kinematic trajectories (AKTs), each revealing coordinated articulator movements toward specific vocal tract shapes. AKTs captured a wide range of movement types, yet they could be differentiated by the place of vocal tract constriction. Additionally, AKTs manifested out-and-back trajectories with harmonic oscillator dynamics. While AKTs were functionally stereotyped across different sentences, context-dependent encoding of preceding and following movements during production of the same phoneme demonstrated the cortical representation of coarticulation. Articulatory movements encoded in sensorimotor cortex give rise to the complex kinematics underlying continuous speech production. VIDEO ABSTRACT.
Project description:Human speech sounds are produced through a coordinated movement of structures along the vocal tract. Here we show highly structured neuronal encoding of vowel articulation. In medial-frontal neurons, we observe highly specific tuning to individual vowels, whereas superior temporal gyrus neurons have nonspecific, sinusoidally modulated tuning (analogous to motor cortical directional tuning). At the neuronal population level, a decoding analysis reveals that the underlying structure of vowel encoding reflects the anatomical basis of articulatory movements. This structured encoding enables accurate decoding of volitional speech segments and could be applied in the development of brain-machine interfaces for restoring speech in paralysed individuals.
Project description:A central challenge for articulatory speech synthesis is the simulation of realistic articulatory movements, which is critical for the generation of highly natural and intelligible speech. This includes modeling coarticulation, i.e., the context-dependent variation of the articulatory and acoustic realization of phonemes, especially of consonants. Here we propose a method to simulate the context-sensitive articulation of consonants in consonant-vowel syllables. To achieve this, the vocal tract target shape of a consonant in the context of a given vowel is derived as the weighted average of three measured and acoustically-optimized reference vocal tract shapes for that consonant in the context of the corner vowels /a/, /i/, and /u/. The weights are determined by mapping the target shape of the given context vowel into the vowel subspace spanned by the corner vowels. The model was applied for the synthesis of consonant-vowel syllables with the consonants /b/, /d/, /g/, /l/, /r/, /m/, /n/ in all combinations with the eight long German vowels. In a perception test, the mean recognition rate for the consonants in the isolated syllables was 82.4%. This demonstrates the potential of the approach for highly intelligible articulatory speech synthesis.
Project description:Acoustic studies have revealed that patients with Essential Tremor treated with thalamic Deep Brain Stimulation (DBS) may suffer from speech deterioration in terms of imprecise oral articulation and reduced voicing control. Based on the acoustic signal one cannot infer, however, whether this deterioration is due to a general slowing down of the speech motor system (e.g., a target undershoot of a desired articulatory goal resulting from being too slow) or disturbed coordination (e.g., a target undershoot caused by problems with the relative phasing of articulatory movements). To elucidate this issue further, we here investigated both acoustics and articulatory patterns of the labial and lingual system using Electromagnetic Articulography (EMA) in twelve Essential Tremor patients treated with thalamic DBS and twelve age- and sex-matched controls. By comparing patients with activated (DBS-ON) and inactivated stimulation (DBS-OFF) with control speakers, we show that critical changes in speech dynamics occur on two levels: With inactivated stimulation (DBS-OFF), patients showed coordination problems of the labial and lingual system in terms of articulatory imprecision and slowness. These effects of articulatory discoordination worsened under activated stimulation, accompanied by an additional overall slowing down of the speech motor system. This leads to a poor performance of syllables on the acoustic surface, reflecting an aggravation either of pre-existing cerebellar deficits and/or the affection of the upper motor fibers of the internal capsule.
Project description:The neurophysiological basis for stuttering may involve deficits that affect dynamic interactions among neural structures supporting fluid speech processing. Here, we examined functional and structural connectivity within corticocortical and thalamocortical loops in adults who stutter. For functional connectivity, we placed seeds in the left and right inferior frontal Brodmann area 44 (BA44) and in the ventral lateral nucleus (VLN) of the thalamus. Subject-specific seeds were based on peak activation voxels captured during speech and nonspeech tasks using functional magnetic resonance imaging. Psychophysiological interaction (PPI) was used to find brain regions with heightened functional connectivity with these cortical and subcortical seeds during speech and nonspeech tasks. Probabilistic tractography was used to track white matter tracts in each hemisphere using the same seeds. Both PPI and tractrography supported connectivity deficits between the left BA44 and the left premotor regions, while connectivity among homologous right hemisphere structures was significantly increased in the stuttering group. No functional connectivity differences between BA44 and auditory regions were found between groups. The functional connectivity results derived from the VLN seeds were less definitive and were not supported by the tractography results. Our data provide strongest support for deficient left hemisphere inferior frontal to premotor connectivity as a neural correlate of stuttering.
Project description:Auditory speech perception enables listeners to access phonological categories from speech sounds. During speech production and speech motor learning, speakers' experience matched auditory and somatosensory input. Accordingly, access to phonetic units might also be provided by somatosensory information. The present study assessed whether humans can identify vowels using somatosensory feedback, without auditory feedback. A tongue-positioning task was used in which participants were required to achieve different tongue postures within the /e, ?, a/ articulatory range, in a procedure that was totally nonspeech like, involving distorted visual feedback of tongue shape. Tongue postures were measured using electromagnetic articulography. At the end of each tongue-positioning trial, subjects were required to whisper the corresponding vocal tract configuration with masked auditory feedback and to identify the vowel associated with the reached tongue posture. Masked auditory feedback ensured that vowel categorization was based on somatosensory feedback rather than auditory feedback. A separate group of subjects was required to auditorily classify the whispered sounds. In addition, we modeled the link between vowel categories and tongue postures in normal speech production with a Bayesian classifier based on the tongue postures recorded from the same speakers for several repetitions of the /e, ?, a/ vowels during a separate speech production task. Overall, our results indicate that vowel categorization is possible with somatosensory feedback alone, with an accuracy that is similar to the accuracy of the auditory perception of whispered sounds, and in congruence with normal speech articulation, as accounted for by the Bayesian classifier.
Project description:BACKGROUND: There has been plentiful evidence of kinesthetically induced rapid compensation for unanticipated perturbation in speech articulatory movements. However, the role of auditory information in stabilizing articulation has been little studied except for the control of voice fundamental frequency, voice amplitude and vowel formant frequencies. Although the influence of auditory information on the articulatory control process is evident in unintended speech errors caused by delayed auditory feedback, the direct and immediate effect of auditory alteration on the movements of articulators has not been clarified. METHODOLOGY/PRINCIPAL FINDINGS: This work examined whether temporal changes in the auditory feedback of bilabial plosives immediately affects the subsequent lip movement. We conducted experiments with an auditory feedback alteration system that enabled us to replace or block speech sounds in real time. Participants were asked to produce the syllable /pa/ repeatedly at a constant rate. During the repetition, normal auditory feedback was interrupted, and one of three pre-recorded syllables /pa/, /?a/, or /pi/, spoken by the same participant, was presented once at a different timing from the anticipated production onset, while no feedback was presented for subsequent repetitions. Comparisons of the labial distance trajectories under altered and normal feedback conditions indicated that the movement quickened during the short period immediately after the alteration onset, when /pa/ was presented 50 ms before the expected timing. Such change was not significant under other feedback conditions we tested. CONCLUSIONS/SIGNIFICANCE: The earlier articulation rapidly induced by the progressive auditory input suggests that a compensatory mechanism helps to maintain a constant speech rate by detecting errors between the internally predicted and actually provided auditory information associated with self movement. The timing- and context-dependent effects of feedback alteration suggest that the sensory error detection works in a temporally asymmetric window where acoustic features of the syllable to be produced may be coded.
Project description:Other than well-known motor activities in the precentral gyrus, functional magnetic resonance imaging (fMRI) studies have found that the ventral part of the precentral gyrus is activated in response to linguistic auditory stimuli. It has been proposed that the premotor cortex in the precentral gyrus is responsible for the comprehension of speech, but the precise function of this area is still debated because patients with frontal lesions that include the precentral gyrus do not exhibit disturbances in speech comprehension. We report on a patient who underwent resection of the tumor in the precentral gyrus with electrocorticographic recordings while she performed the verb generation task during awake brain craniotomy. Consistent with previous fMRI studies, high-gamma band auditory activity was observed in the precentral gyrus. Due to the location of the tumor, the patient underwent resection of the auditory responsive precentral area which resulted in the post-operative expression of a characteristic articulatory disturbance known as apraxia of speech (AOS). The language function of the patient was otherwise preserved and she exhibited intact comprehension of both spoken and written language. The present findings demonstrated that a lesion restricted to the ventral precentral gyrus is sufficient for the expression of AOS and suggest that the auditory-responsive area plays an important role in the execution of fluent speech rather than the comprehension of speech. These findings also confirm that the function of the premotor area is predominantly motor in nature and its sensory responses is more consistent with the "sensory theory of speech production," in which it was proposed that sensory representations are used to guide motor-articulatory processes.
Project description:Several perisylvian brain regions show preferential activation for spoken language above and beyond other complex sounds. These "speech-selective" effects might be driven by regions' intrinsic biases for processing the acoustical or informational properties of speech. Alternatively, such speech selectivity might emerge through extensive experience in perceiving and producing speech sounds. This functional magnetic resonance imaging (fMRI) study disambiguated such audiomotor expertise from speech selectivity by comparing activation for listening to speech and music in female professional violinists and actors. Audiomotor expertise effects were identified in several right and left superior temporal regions that responded to speech in all participants and music in violinists more than actresses. Regions associated with the acoustic/information content of speech were identified along the entire length of the superior temporal sulci bilaterally where activation was greater for speech than music in all participants. Finally, an effect of performing arts training was identified in bilateral premotor regions commonly activated by finger and mouth movements as well as in right hemisphere "language regions." These results distinguish the seemingly speech-specific neural responses that can be abolished and even reversed by long-term audiomotor experience.