Sublexical properties of spoken words modulate activity in Broca's area but not superior temporal cortex: implications for models of speech recognition.
ABSTRACT: Many models of spoken word recognition posit that the acoustic stream is parsed into phoneme level units, which in turn activate larger representations [McClelland, J. L., & Elman, J. L. The TRACE model of speech perception. Cognitive Psychology, 18, 1-86, 1986], whereas others suggest that larger units of analysis are activated without the need for segmental mediation [Greenberg, S. A multitier theoretical framework for understanding spoken language. In S. Greenberg & W. A. Ainsworth (Eds.), Listening to speech: An auditory perspective (pp. 411-433). Mahwah, NJ: Erlbaum, 2005; Klatt, D. H. Speech perception: A model of acoustic-phonetic analysis and lexical access. Journal of Phonetics, 7, 279-312, 1979; Massaro, D. W. Preperceptual images, processing time, and perceptual units in auditory perception. Psychological Review, 79, 124-145, 1972]. Identifying segmental effects in the brain's response to speech may speak to this question. For example, if such effects were localized to relatively early processing stages in auditory cortex, this would support a model of speech recognition in which segmental units are explicitly parsed out. In contrast, segmental processes that occur outside auditory cortex may indicate that alternative models should be considered. The current fMRI experiment manipulated the phonotactic frequency (PF) of words that were auditorily presented in short lists while participants performed a pseudoword detection task. PF is thought to modulate networks in which phoneme level units are represented. The present experiment identified activity in the left inferior frontal gyrus that was positively correlated with PF. No effects of PF were found in temporal lobe regions. We propose that the observed phonotactic effects during speech listening reflect the strength of the association between acoustic speech patterns and articulatory speech codes involving phoneme level units. On the basis of existing lesion evidence, we interpret the function of this auditory-motor association as playing a role primarily in production. These findings are consistent with the view that phoneme level units are not necessarily accessed during speech recognition.
Project description:Listeners show a reliable bias towards interpreting speech sounds in a way that conforms to linguistic restrictions (phonotactic constraints) on the permissible patterning of speech sounds in a language. This perceptual bias may enforce and strengthen the systematicity that is the hallmark of phonological representation. Using Granger causality analysis of magnetic resonance imaging (MRI)-constrained magnetoencephalography (MEG) and electroencephalography (EEG) data, we tested the differential predictions of rule-based, frequency-based, and top-down lexical influence-driven explanations of processes that produce phonotactic biases in phoneme categorization. Consistent with the top-down lexical influence account, brain regions associated with the representation of words had a stronger influence on acoustic-phonetic regions in trials that led to the identification of phonotactically legal (versus illegal) word-initial consonant clusters. Regions associated with the application of linguistic rules had no such effect. Similarly, high frequency phoneme clusters failed to produce stronger feedforward influences by acoustic-phonetic regions on areas associated with higher linguistic representation. These results suggest that top-down lexical influences contribute to the systematicity of phonological representation.
Project description:Phonotactic frequency effects play a crucial role in a number of debates over language processing and representation. It is unclear however, whether these effects reflect prelexical sensitivity to phonotactic frequency, or lexical "gang effects" in speech perception. In this paper, we use Granger causality analysis of MR-constrained MEG/EEG data to understand how phonotactic frequency influences neural processing dynamics during auditory lexical decision. Effective connectivity analysis showed weaker feedforward influence from brain regions involved in acoustic-phonetic processing (superior temporal gyrus) to lexical areas (supramarginal gyrus) for high phonotactic frequency words, but stronger top-down lexical influence for the same items. Low entropy nonwords (nonwords judged to closely resemble real words) showed a similar pattern of interactions between brain regions involved in lexical and acoustic-phonetic processing. These results contradict the predictions of a feedforward model of phonotactic frequency facilitation, but support the predictions of a lexically mediated account.
Project description:Although there is a large consensus regarding the involvement of specific acoustic cues in speech perception, the precise mechanisms underlying the transformation from continuous acoustical properties into discrete perceptual units remains undetermined. This gap in knowledge is partially due to the lack of a turnkey solution for isolating critical speech cues from natural stimuli. In this paper, we describe a psychoacoustic imaging method known as the Auditory Classification Image technique that allows experimenters to estimate the relative importance of time-frequency regions in categorizing natural speech utterances in noise. Importantly, this technique enables the testing of hypotheses on the listening strategies of participants at the group level. We exemplify this approach by identifying the acoustic cues involved in da/ga categorization with two phonetic contexts, Al- or Ar-. The application of Auditory Classification Images to our group of 16 participants revealed significant critical regions on the second and third formant onsets, as predicted by the literature, as well as an unexpected temporal cue on the first formant. Finally, through a cluster-based nonparametric test, we demonstrate that this method is sufficiently sensitive to detect fine modifications of the classification strategies between different utterances of the same phoneme.
Project description:Do speakers of all languages use segmental speech sounds when they produce words? Existing models of language production generally assume a mental representation of individual segmental units, or phonemes, but the bulk of evidence comes from speakers of European languages in which the orthographic system codes explicitly for speech sounds. By contrast, in languages with nonalphabetical scripts, such as Mandarin Chinese, individual speech sounds are not orthographically represented, raising the possibility that speakers of these languages do not use phonemes as fundamental processing units. We used event-related potentials (ERPs) combined with behavioral measurement to investigate the role of phonemes in Mandarin production. Mandarin native speakers named colored line drawings of objects using color adjective-noun phrases; color and object name either shared the initial phoneme or were phonologically unrelated. Whereas naming latencies were unaffected by phoneme repetition, ERP responses were modulated from 200 ms after picture onset. Our ERP findings thus provide strong support for the claim that phonemic segments constitute fundamental units of phonological encoding even for speakers of languages that do not encode such units orthographically.
Project description:During speech perception, a central task of the auditory cortex is to analyze complex acoustic patterns to allow detection of the words that encode a linguistic message . It is generally thought that this process includes at least one intermediate, phonetic, level of representations [2-6], localized bilaterally in the superior temporal lobe [7-9]. Phonetic representations reflect a transition from acoustic to linguistic information, classifying acoustic patterns into linguistically meaningful units, which can serve as input to mechanisms that access abstract word representations [10, 11]. While recent research has identified neural signals arising from successful recognition of individual words in continuous speech [12-15], no explicit neurophysiological signal has been found demonstrating the transition from acoustic and/or phonetic to symbolic, lexical representations. Here, we report a response reflecting the incremental integration of phonetic information for word identification, dominantly localized to the left temporal lobe. The short response latency, approximately 114 ms relative to phoneme onset, suggests that phonetic information is used for lexical processing as soon as it becomes available. Responses also tracked word boundaries, confirming previous reports of immediate lexical segmentation [16, 17]. These new results were further investigated using a cocktail-party paradigm [18, 19] in which participants listened to a mix of two talkers, attending to one and ignoring the other. Analysis indicates neural lexical processing of only the attended, but not the unattended, speech stream. Thus, while responses to acoustic features reflect attention through selective amplification of attended speech, responses consistent with a lexical processing model reveal categorically selective processing.
Project description:Speech unfolds over time, and the cues for even a single phoneme are rarely available simultaneously. Consequently, to recognize a single phoneme, listeners must integrate material over several hundred milliseconds. Prior work contrasts two accounts: (a) a memory buffer account in which listeners accumulate auditory information in memory and only access higher level representations (i.e., lexical representations) when sufficient information has arrived; and (b) an immediate integration scheme in which lexical representations can be partially activated on the basis of early cues and then updated when more information arises. These studies have uniformly shown evidence for immediate integration for a variety of phonetic distinctions. We attempted to extend this to fricatives, a class of speech sounds which requires not only temporal integration of asynchronous cues (the frication, followed by the formant transitions 150-350 ms later), but also integration across different frequency bands and compensation for contextual factors like coarticulation. Eye movements in the visual world paradigm showed clear evidence for a memory buffer. Results were replicated in five experiments, ruling out methodological factors and tying the release of the buffer to the onset of the vowel. These findings support a general auditory account for speech by suggesting that the acoustic nature of particular speech sounds may have large effects on how they are processed. It also has major implications for theories of auditory and speech perception by raising the possibility of an encapsulated memory buffer in early auditory processing.
Project description:Understanding speech in noisy environments is challenging, especially for seniors. Although evidence suggests that older adults increasingly recruit prefrontal cortices to offset reduced periphery and central auditory processing, the brain mechanisms underlying such compensation remain elusive. Here we show that relative to young adults, older adults show higher activation of frontal speech motor areas as measured by functional MRI during a syllable identification task at varying signal-to-noise ratios. This increased activity correlates with improved speech discrimination performance in older adults. Multivoxel pattern classification reveals that despite an overall phoneme dedifferentiation, older adults show greater specificity of phoneme representations in frontal articulatory regions than auditory regions. Moreover, older adults with stronger frontal activity have higher phoneme specificity in frontal and auditory regions. Thus, preserved phoneme specificity and upregulation of activity in speech motor regions provide a means of compensation in older adults for decoding impoverished speech representations in adverse listening conditions.
Project description:A fundamental task of the ascending auditory system is to produce representations that facilitate the recognition of complex sounds. This is particularly challenging in the context of acoustic variability, such as that between different talkers producing the same phoneme. These representations are transformed as information is propagated throughout the ascending auditory system from the inner ear to the auditory cortex (AI). Investigating these transformations and their role in speech recognition is key to understanding hearing impairment and the development of future clinical interventions. Here, we obtained neural responses to an extensive set of natural vowel-consonant-vowel phoneme sequences, each produced by multiple talkers, in three stages of the auditory processing pathway. Auditory nerve (AN) representations were simulated using a model of the peripheral auditory system and extracellular neuronal activity was recorded in the inferior colliculus (IC) and primary auditory cortex (AI) of anaesthetized guinea pigs. A classifier was developed to examine the efficacy of these representations for recognizing the speech sounds. Individual neurons convey progressively less information from AN to AI. Nonetheless, at the population level, representations are sufficiently rich to facilitate recognition of consonants with a high degree of accuracy at all stages indicating a progression from a dense, redundant representation to a sparse, distributed one. We examined the timescale of the neural code for consonant recognition and found that optimal timescales increase throughout the ascending auditory system from a few milliseconds in the periphery to several tens of milliseconds in the cortex. Despite these longer timescales, we found little evidence to suggest that representations up to the level of AI become increasingly invariant to across-talker differences. Instead, our results support the idea that the role of the subcortical auditory system is one of dimensionality expansion, which could provide a basis for flexible classification of arbitrary speech sounds.
Project description:Humans are adept at understanding speech despite the fact that our natural listening environment is often filled with interference. An example of this capacity is phoneme restoration, in which part of a word is completely replaced by noise, yet listeners report hearing the whole word. The neurological basis for this unconscious fill-in phenomenon is unknown, despite being a fundamental characteristic of human hearing. Here, using direct cortical recordings in humans, we demonstrate that missing speech is restored at the acoustic-phonetic level in bilateral auditory cortex, in real-time. This restoration is preceded by specific neural activity patterns in a separate language area, left frontal cortex, which predicts the word that participants later report hearing. These results demonstrate that during speech perception, missing acoustic content is synthesized online from the integration of incoming sensory cues and the internal neural dynamics that bias word-level expectation and prediction.
Project description:Dyslexia is associated with abnormal performance on many auditory psychophysics tasks, particularly those involving the categorization of speech sounds. However, it is debated whether those apparent auditory deficits arise from (a) reduced sensitivity to particular acoustic cues, (b) the difficulty of experimental tasks, or (c) unmodeled lapses of attention. Here we investigate the relationship between phoneme categorization and reading ability, with special attention to the nature of the cue encoding the phoneme contrast (static versus dynamic), differences in task paradigm difficulty, and methodological details of psychometric model fitting. We find a robust relationship between reading ability and categorization performance, show that task difficulty cannot fully explain that relationship, and provide evidence that the deficit is not restricted to dynamic cue contrasts, contrary to prior reports. Finally, we demonstrate that improved modeling of behavioral responses suggests that performance does differ between children with dyslexia and typical readers, but that the difference may be smaller than previously reported.