The role of the posterior superior temporal sulcus in audiovisual processing.
ABSTRACT: In this study we investigate previous claims that a region in the left posterior superior temporal sulcus (pSTS) is more activated by audiovisual than unimodal processing. First, we compare audiovisual to visual-visual and auditory-auditory conceptual matching using auditory or visual object names that are paired with pictures of objects or their environmental sounds. Second, we compare congruent and incongruent audiovisual trials when presentation is simultaneous or sequential. Third, we compare audiovisual stimuli that are either verbal (auditory and visual words) or nonverbal (pictures of objects and their associated sounds). The results demonstrate that, when task, attention, and stimuli are controlled, pSTS activation for audiovisual conceptual matching is 1) identical to that observed for intramodal conceptual matching, 2) greater for incongruent than congruent trials when auditory and visual stimuli are simultaneously presented, and 3) identical for verbal and nonverbal stimuli. These results are not consistent with previous claims that pSTS activation reflects the active formation of an integrated audiovisual representation. After a discussion of the stimulus and task factors that modulate activation, we conclude that, when stimulus input, task, and attention are controlled, pSTS is part of a distributed set of regions involved in conceptual matching, irrespective of whether the stimuli are audiovisual, auditory-auditory or visual-visual.
Project description:According to the Bayesian framework of multisensory integration, audiovisual stimuli associated with a stronger prior belief that they share a common cause (i.e., causal prior) are predicted to result in a greater degree of perceptual binding and therefore greater audiovisual integration. In the present psychophysical study, we systematically manipulated the causal prior while keeping sensory evidence constant. We paired auditory and visual stimuli during an association phase to be spatiotemporally either congruent or incongruent, with the goal of driving the causal prior in opposite directions for different audiovisual pairs. Following this association phase, every pairwise combination of the auditory and visual stimuli was tested in a typical ventriloquism-effect (VE) paradigm. The size of the VE (i.e., the shift of auditory localization towards the spatially discrepant visual stimulus) indicated the degree of multisensory integration. Results showed that exposure to an audiovisual pairing as spatiotemporally congruent compared to incongruent resulted in a larger subsequent VE (Experiment 1). This effect was further confirmed in a second VE paradigm, where the congruent and the incongruent visual stimuli flanked the auditory stimulus, and a VE in the direction of the congruent visual stimulus was shown (Experiment 2). Since the unisensory reliabilities for the auditory or visual components did not change after the association phase, the observed effects are likely due to changes in multisensory binding by association learning. As suggested by Bayesian theories of multisensory processing, our findings support the existence of crossmodal causal priors that are flexibly shaped by experience in a changing world.
Project description:In light of the growing body of research that is revealing the significant role of the auditory domain in sport, the present study aims to investigate the contribution of early auditory and visual information to the prediction of volleyball serves' length. To this purpose, three within-subjects experiments were run, which differed among them in terms of stimuli (audiovisual congruent vs audiovisual incongruent; audio only vs video only) and/or in terms of number of possible answers. In particular, expert volleyball players were asked to predict the length of temporally occluded overhand serves, choosing among either two or three possible landing sectors. Response accuracy and response times were measured. For the incongruent stimuli, the results revealed that the percentage of predictions in line with early auditory information was significantly higher than the respective percentage of predictions in line with early visual information. For unimodal stimuli, prediction accuracy was significantly higher on the basis of auditory information than on the basis of visual information, without any difference on response times. Taken together, the results highlighted the relevance of early auditory information for the prediction of volleyball serves' length.
Project description:Causal inference-the process of deciding whether two incoming signals come from the same source-is an important step in audiovisual (AV) speech perception. This research explored causal inference and perception of incongruent AV English consonants. Nine adults were presented auditory, visual, congruent AV, and incongruent AV consonant-vowel syllables. Incongruent AV stimuli included auditory and visual syllables with matched vowels, but mismatched consonants. Open-set responses were collected. For most incongruent syllables, participants were aware of the mismatch between auditory and visual signals (59.04%) or reported the auditory syllable (33.73%). Otherwise, participants reported the visual syllable (1.13%) or some other syllable (6.11%). Statistical analyses were used to assess whether visual distinctiveness and place, voice, and manner features predicted responses. Mismatch responses occurred more when the auditory and visual consonants were visually distinct, when place and manner differed across auditory and visual consonants, and for consonants with high visual accuracy. Auditory responses occurred more when the auditory and visual consonants were visually similar, when place and manner were the same across auditory and visual stimuli, and with consonants produced further back in the mouth. Visual responses occurred more when voicing and manner were the same across auditory and visual stimuli, and for front and middle consonants. Other responses were variable, but typically matched the visual place, auditory voice, and auditory manner of the input. Overall, results indicate that causal inference and incongruent AV consonant perception depend on salience and reliability of auditory and visual inputs and degree of redundancy between auditory and visual inputs. A parameter-free computational model of incongruent AV speech perception based on unimodal confusions, with a causal inference rule, was applied. Data from the current study present an opportunity to test and improve the generalizability of current AV speech integration models.
Project description:Speech perception is influenced by vision through a process of audiovisual integration. This is demonstrated by the McGurk illusion where visual speech (for example /ga/) dubbed with incongruent auditory speech (such as /ba/) leads to a modified auditory percept (/da/). Recent studies have indicated that perception of the incongruent speech stimuli used in McGurk paradigms involves mechanisms of both general and audiovisual speech specific mismatch processing and that general mismatch processing modulates induced theta-band (4-8 Hz) oscillations. Here, we investigated whether the theta modulation merely reflects mismatch processing or, alternatively, audiovisual integration of speech. We used electroencephalographic recordings from two previously published studies using audiovisual sine-wave speech (SWS), a spectrally degraded speech signal sounding nonsensical to naïve perceivers but perceived as speech by informed subjects. Earlier studies have shown that informed, but not naïve subjects integrate SWS phonetically with visual speech. In an N1/P2 event-related potential paradigm, we found a significant difference in theta-band activity between informed and naïve perceivers of audiovisual speech, suggesting that audiovisual integration modulates induced theta-band oscillations. In a McGurk mismatch negativity paradigm (MMN) where infrequent McGurk stimuli were embedded in a sequence of frequent audio-visually congruent stimuli we found no difference between congruent and McGurk stimuli. The infrequent stimuli in this paradigm are violating both the general prediction of stimulus content, and that of audiovisual congruence. Hence, we found no support for the hypothesis that audiovisual mismatch modulates induced theta-band oscillations. We also did not find any effects of audiovisual integration in the MMN paradigm, possibly due to the experimental design.
Project description:One of the central questions in cognitive neuroscience is the precise neural representation, or brain pattern, associated with a semantic category. In this study, we explored the influence of audiovisual stimuli on the brain patterns of concepts or semantic categories through a functional magnetic resonance imaging (fMRI) experiment. We used a pattern search method to extract brain patterns corresponding to two semantic categories: "old people" and "young people." These brain patterns were elicited by semantically congruent audiovisual, semantically incongruent audiovisual, unimodal visual, and unimodal auditory stimuli belonging to the two semantic categories. We calculated the reproducibility index, which measures the similarity of the patterns within the same category. We also decoded the semantic categories from these brain patterns. The decoding accuracy reflects the discriminability of the brain patterns between two categories. The results showed that both the reproducibility index of brain patterns and the decoding accuracy were significantly higher for semantically congruent audiovisual stimuli than for unimodal visual and unimodal auditory stimuli, while the semantically incongruent stimuli did not elicit brain patterns with significantly higher reproducibility index or decoding accuracy. Thus, the semantically congruent audiovisual stimuli enhanced the within-class reproducibility of brain patterns and the between-class discriminability of brain patterns, and facilitate neural representations of semantic categories or concepts. Furthermore, we analyzed the brain activity in superior temporal sulcus and middle temporal gyrus (STS/MTG). The strength of the fMRI signal and the reproducibility index were enhanced by the semantically congruent audiovisual stimuli. Our results support the use of the reproducibility index as a potential tool to supplement the fMRI signal amplitude for evaluating multimodal integration.
Project description:We used human electroencephalogram to study early audiovisual integration of dynamic angry and neutral expressions. An auditory-only condition served as a baseline for the interpretation of integration effects. In the audiovisual conditions, the validity of visual information was manipulated using facial expressions that were either emotionally congruent or incongruent with the vocal expressions. First, we report an N1 suppression effect for angry compared with neutral vocalizations in the auditory-only condition. Second, we confirm early integration of congruent visual and auditory information as indexed by a suppression of the auditory N1 and P2 components in the audiovisual compared with the auditory-only condition. Third, audiovisual N1 suppression was modulated by audiovisual congruency in interaction with emotion: for neutral vocalizations, there was N1 suppression in both the congruent and the incongruent audiovisual conditions. For angry vocalizations, there was N1 suppression only in the congruent but not in the incongruent condition. Extending previous findings of dynamic audiovisual integration, the current results suggest that audiovisual N1 suppression is congruency- and emotion-specific and indicate that dynamic emotional expressions compared with non-emotional expressions are preferentially processed in early audiovisual integration.
Project description:The semantic variant of primary progressive aphasia (PPA-S) is diagnosed based on impaired single-word comprehension, but nonverbal impairments in face and object recognition can also be present, particularly in later disease stages. PPA-S is associated with focal atrophy in the left anterior temporal lobe (ATL), often accompanied by a lesser degree of atrophy in the right ATL. According to a dual-route account, the left ATL is critical for verbal access to conceptual knowledge while nonverbal access to conceptual knowledge depends upon the integrity of right ATL. Consistent with this view, single-word comprehension deficits in PPA-S have consistently been linked to the degree of atrophy in left ATL. In the current study we examined object processing and cortical thickness in 19 patients diagnosed with PPA-S, to evaluate the hypothesis that nonverbal object impairments would instead be determined by the amount of atrophy in the right ATL. All patients demonstrated inability to access conceptual knowledge on standardized tests with word stimuli: they were unable to match spoken words with their corresponding pictures on the Peabody Picture Vocabulary Test. Only a minority of patients, however, performed abnormally on an experimental thematic verification task, which requires judgments as to whether pairs of object pictures are thematically-associated, and does not rely on auditory or visual word input. The entire PPA-S group showed cortical thinning in left ATL, but atrophy in right ATL was more prominent in the subgroup with low verification scores. Thematic verification scores were correlated with cortical thickness in the right rather than left ATL, an asymmetric mapping which persisted when controlling for the degree of atrophy in the contralateral hemisphere. These results are consistent with a dual-route account of conceptual knowledge: breakdown of the verbal left hemispheric route produces an aphasic syndrome, which is only accompanied by visual object processing impairments when the nonverbal right hemispheric route is also compromised.
Project description:The visual color-word Stroop task is widely used in clinical and research settings as a measure of cognitive control. Numerous neuroimaging studies have used color-word Stroop tasks to investigate the neural resources supporting cognitive control, but to our knowledge all have used unimodal (typically visual) Stroop paradigms. Thus, it is possible that this classic measure of cognitive control is not capturing the resources involved in multisensory cognitive control. The audiovisual integration and crossmodal correspondence literatures identify regions sensitive to congruency of auditory and visual stimuli, but it is unclear how these regions relate to the unimodal cognitive control literature. In this study we aimed to identify brain regions engaged by crossmodal cognitive control during an audiovisual color-word Stroop task, and how they relate to previous unimodal Stroop and audiovisual integration findings. First, we replicated previous behavioral audiovisual Stroop findings in an fMRI-adapted audiovisual Stroop paradigm: incongruent visual information increased reaction time towards an auditory stimulus and congruent visual information decreased reaction time. Second, we investigated the brain regions supporting cognitive control during an audiovisual color-word Stroop task using fMRI. Similar to unimodal cognitive control tasks, a left superior parietal region exhibited an interference effect of visual information on the auditory stimulus. This superior parietal region was also identified using a standard audiovisual integration localizing procedure, indicating that audiovisual integration resources are sensitive to cognitive control demands. Facilitation of the auditory stimulus by congruent visual information was found in posterior superior temporal cortex, including in the posterior STS which has been found to support audiovisual integration. The dorsal anterior cingulate cortex, often implicated in unimodal Stroop tasks, was not modulated by the audiovisual Stroop task. Overall the findings indicate that an audiovisual color-word Stroop task engages overlapping resources with audiovisual integration and overlapping but distinct resources compared to unimodal Stroop tasks.
Project description:Visual inputs can distort auditory perception, and accurate auditory processing requires the ability to detect and ignore visual input that is simultaneous and incongruent with auditory information. However, the neural basis of this auditory selection from audiovisual information is unknown, whereas integration process of audiovisual inputs is intensively researched. Here, we tested the hypothesis that the inferior frontal gyrus (IFG) and superior temporal sulcus (STS) are involved in top-down and bottom-up processing, respectively, of target auditory information from audiovisual inputs. We recorded high gamma activity (HGA), which is associated with neuronal firing in local brain regions, using electrocorticography while patients with epilepsy judged the syllable spoken by a voice while looking at a voice-congruent or -incongruent lip movement from the speaker. The STS exhibited stronger HGA if the patient was presented with information of large audiovisual incongruence than of small incongruence, especially if the auditory information was correctly identified. On the other hand, the IFG exhibited stronger HGA in trials with small audiovisual incongruence when patients correctly perceived the auditory information than when patients incorrectly perceived the auditory information due to the mismatched visual information. These results indicate that the IFG and STS have dissociated roles in selective auditory processing, and suggest that the neural basis of selective auditory processing changes dynamically in accordance with the degree of incongruity between auditory and visual information.