Short-Term Audiovisual Spatial Training Enhances Electrophysiological Correlates of Auditory Selective Spatial Attention.
ABSTRACT: Audiovisual cross-modal training has been proposed as a tool to improve human spatial hearing. Here, we investigated training-induced modulations of event-related potential (ERP) components that have been associated with processes of auditory selective spatial attention when a speaker of interest has to be localized in a multiple speaker ("cocktail-party") scenario. Forty-five healthy participants were tested, including younger (19-29 years; n = 21) and older (66-76 years; n = 24) age groups. Three conditions of short-term training (duration 15 min) were compared, requiring localization of non-speech targets under "cocktail-party" conditions with either (1) synchronous presentation of co-localized auditory-target and visual stimuli (audiovisual-congruency training) or (2) immediate visual feedback on correct or incorrect localization responses (visual-feedback training), or (3) presentation of spatially incongruent auditory-target and visual stimuli presented at random positions with synchronous onset (control condition). Prior to and after training, participants were tested in an auditory spatial attention task (15 min), requiring localization of a predefined spoken word out of three distractor words, which were presented with synchronous stimulus onset from different positions. Peaks of ERP components were analyzed with a specific focus on the N2, which is known to be a correlate of auditory selective spatial attention. N2 amplitudes were significantly larger after audiovisual-congruency training compared with the remaining training conditions for younger, but not older, participants. Also, at the time of the N2, distributed source analysis revealed an enhancement of neural activity induced by audiovisual-congruency training in dorsolateral prefrontal cortex (Brodmann area 9) for the younger group. These findings suggest that cross-modal processes induced by audiovisual-congruency training under "cocktail-party" conditions at a short time scale resulted in an enhancement of correlates of auditory selective spatial attention.
Project description:At cocktail parties, our brains often simultaneously receive visual and auditory information. Although the cocktail party problem has been widely investigated under auditory-only settings, the effects of audiovisual inputs have not. This study explored the effects of audiovisual inputs in a simulated cocktail party. In our fMRI experiment, each congruent audiovisual stimulus was a synthesis of 2 facial movie clips, each of which could be classified into 1 of 2 emotion categories (crying and laughing). Visual-only (faces) and auditory-only stimuli (voices) were created by extracting the visual and auditory contents from the synthesized audiovisual stimuli. Subjects were instructed to selectively attend to 1 of the 2 objects contained in each stimulus and to judge its emotion category in the visual-only, auditory-only, and audiovisual conditions. The neural representations of the emotion features were assessed by calculating decoding accuracy and brain pattern-related reproducibility index based on the fMRI data. We compared the audiovisual condition with the visual-only and auditory-only conditions and found that audiovisual inputs enhanced the neural representations of emotion features of the attended objects instead of the unattended objects. This enhancement might partially explain the benefits of audiovisual inputs for the brain to solve the cocktail party problem.
Project description:Information integration is considered a hallmark of human consciousness. Recent research has challenged this tenet by showing multisensory interactions in the absence of awareness. This psychophysics study assessed the impact of spatial and semantic correspondences on audiovisual binding in the presence and absence of visual awareness by combining forward-backward masking with spatial ventriloquism. Observers were presented with object pictures and synchronous sounds that were spatially and/or semantically congruent or incongruent. On each trial observers located the sound, identified the picture and rated the picture's visibility. We observed a robust ventriloquist effect for subjectively visible and invisible pictures indicating that pictures that evade our perceptual awareness influence where we perceive sounds. Critically, semantic congruency enhanced these visual biases on perceived sound location only when the picture entered observers' awareness. Our results demonstrate that crossmodal influences operating from vision to audition and vice versa are interactively controlled by spatial and semantic congruency in the presence of awareness. However, when visual processing is disrupted by masking procedures audiovisual interactions no longer depend on semantic correspondences.
Project description:We used human electroencephalogram to study early audiovisual integration of dynamic angry and neutral expressions. An auditory-only condition served as a baseline for the interpretation of integration effects. In the audiovisual conditions, the validity of visual information was manipulated using facial expressions that were either emotionally congruent or incongruent with the vocal expressions. First, we report an N1 suppression effect for angry compared with neutral vocalizations in the auditory-only condition. Second, we confirm early integration of congruent visual and auditory information as indexed by a suppression of the auditory N1 and P2 components in the audiovisual compared with the auditory-only condition. Third, audiovisual N1 suppression was modulated by audiovisual congruency in interaction with emotion: for neutral vocalizations, there was N1 suppression in both the congruent and the incongruent audiovisual conditions. For angry vocalizations, there was N1 suppression only in the congruent but not in the incongruent condition. Extending previous findings of dynamic audiovisual integration, the current results suggest that audiovisual N1 suppression is congruency- and emotion-specific and indicate that dynamic emotional expressions compared with non-emotional expressions are preferentially processed in early audiovisual integration.
Project description:The goal of this study was to investigate how cognitive factors influence performance in a multi-talker, "cocktail-party" like environment in musicians and non-musicians. This was achieved by relating performance in a spatial hearing task to cognitive processing abilities assessed using measures of executive function (EF) and visual attention in musicians and non-musicians. For the spatial hearing task, a speech target was presented simultaneously with two intelligible speech maskers that were either colocated with the target (0° azimuth) or were symmetrically separated from the target in azimuth (at ±15°). EF assessment included measures of cognitive flexibility, inhibition control and auditory working memory. Selective attention was assessed in the visual domain using a multiple object tracking task (MOT). For the MOT task, the observers were required to track target dots (n = 1,2,3,4,5) in the presence of interfering distractor dots. Musicians performed significantly better than non-musicians in the spatial hearing task. For the EF measures, musicians showed better performance on measures of auditory working memory compared to non-musicians. Furthermore, across all individuals, a significant correlation was observed between performance on the spatial hearing task and measures of auditory working memory. This result suggests that individual differences in performance in a cocktail party-like environment may depend in part on cognitive factors such as auditory working memory. Performance in the MOT task did not differ between groups. However, across all individuals, a significant correlation was found between performance in the MOT and spatial hearing tasks. A stepwise multiple regression analysis revealed that musicianship and performance on the MOT task significantly predicted performance on the spatial hearing task. Overall, these findings confirm the relationship between musicianship and cognitive factors including domain-general selective attention and working memory in solving the "cocktail party problem".
Project description:The visual color-word Stroop task is widely used in clinical and research settings as a measure of cognitive control. Numerous neuroimaging studies have used color-word Stroop tasks to investigate the neural resources supporting cognitive control, but to our knowledge all have used unimodal (typically visual) Stroop paradigms. Thus, it is possible that this classic measure of cognitive control is not capturing the resources involved in multisensory cognitive control. The audiovisual integration and crossmodal correspondence literatures identify regions sensitive to congruency of auditory and visual stimuli, but it is unclear how these regions relate to the unimodal cognitive control literature. In this study we aimed to identify brain regions engaged by crossmodal cognitive control during an audiovisual color-word Stroop task, and how they relate to previous unimodal Stroop and audiovisual integration findings. First, we replicated previous behavioral audiovisual Stroop findings in an fMRI-adapted audiovisual Stroop paradigm: incongruent visual information increased reaction time towards an auditory stimulus and congruent visual information decreased reaction time. Second, we investigated the brain regions supporting cognitive control during an audiovisual color-word Stroop task using fMRI. Similar to unimodal cognitive control tasks, a left superior parietal region exhibited an interference effect of visual information on the auditory stimulus. This superior parietal region was also identified using a standard audiovisual integration localizing procedure, indicating that audiovisual integration resources are sensitive to cognitive control demands. Facilitation of the auditory stimulus by congruent visual information was found in posterior superior temporal cortex, including in the posterior STS which has been found to support audiovisual integration. The dorsal anterior cingulate cortex, often implicated in unimodal Stroop tasks, was not modulated by the audiovisual Stroop task. Overall the findings indicate that an audiovisual color-word Stroop task engages overlapping resources with audiovisual integration and overlapping but distinct resources compared to unimodal Stroop tasks.
Project description:Sound localization requires the integration in the brain of auditory spatial cues generated by interactions with the external ears, head and body. Perceptual learning studies have shown that the relative weighting of these cues can change in a context-dependent fashion if their relative reliability is altered. One factor that may influence this process is vision, which tends to dominate localization judgments when both modalities are present and induces a recalibration of auditory space if they become misaligned. It is not known, however, whether vision can alter the weighting of individual auditory localization cues. Using virtual acoustic space stimuli, we measured changes in subjects' sound localization biases and binaural localization cue weights after ?50 min of training on audiovisual tasks in which visual stimuli were either informative or not about the location of broadband sounds. Four different spatial configurations were used in which we varied the relative reliability of the binaural cues: interaural time differences (ITDs) and frequency-dependent interaural level differences (ILDs). In most subjects and experiments, ILDs were weighted more highly than ITDs before training. When visual cues were spatially uninformative, some subjects showed a reduction in auditory localization bias and the relative weighting of ILDs increased after training with congruent binaural cues. ILDs were also upweighted if they were paired with spatially-congruent visual cues, and the largest group-level improvements in sound localization accuracy occurred when both binaural cues were matched to visual stimuli. These data suggest that binaural cue reweighting reflects baseline differences in the relative weights of ILDs and ITDs, but is also shaped by the availability of congruent visual stimuli. Training subjects with consistently misaligned binaural and visual cues produced the ventriloquism aftereffect, i.e., a corresponding shift in auditory localization bias, without affecting the inter-subject variability in sound localization judgments or their binaural cue weights. Our results show that the relative weighting of different auditory localization cues can be changed by training in ways that depend on their reliability as well as the availability of visual spatial information, with the largest improvements in sound localization likely to result from training with fully congruent audiovisual information.
Project description:Aging affects the interplay between peripheral and cortical auditory processing. Previous studies have demonstrated that older adults are less able to regulate afferent sensory information and are more sensitive to distracting information. Using auditory event-related potentials we investigated the role of cortical inhibition on auditory and audiovisual processing in younger and older adults. Across puretone, auditory and audiovisual speech paradigms older adults showed a consistent pattern of inhibitory deficits, manifested as increased P50 and/or N1 amplitudes and an absent or significantly reduced N2. Older adults were still able to use congruent visual articulatory information to aid auditory processing but appeared to require greater neural effort to resolve conflicts generated by incongruent visual information. In combination, the results provide support for the Inhibitory Deficit Hypothesis of aging. They extend previous findings into the audiovisual domain and highlight older adults' ability to benefit from congruent visual information during speech processing.
Project description:Seeing the image of a newscaster on a television set causes us to think that the sound coming from the loudspeaker is actually coming from the screen. How images capture sounds is mysterious because the brain uses different methods for determining the locations of visual versus auditory stimuli. The retina senses the locations of visual objects with respect to the eyes, whereas differences in sound characteristics across the ears indicate the locations of sound sources referenced to the head. Here, we tested which reference frame (RF) is used when vision recalibrates perceived sound locations. Visually guided biases in sound localization were induced in seven humans and two monkeys who made eye movements to auditory or audiovisual stimuli. On audiovisual (training) trials, the visual component of the targets was displaced laterally by 5-6 degrees. Interleaved auditory-only (probe) trials served to evaluate the effect of experience with mismatched visual stimuli on auditory localization. We found that the displaced visual stimuli induced ventriloquism aftereffect in both humans (approximately 50% of the displacement size) and monkeys (approximately 25%), but only for locations around the trained spatial region, showing that audiovisual recalibration can be spatially specific. We tested the reference frame in which the recalibration occurs. On probe trials, we varied eye position relative to the head to dissociate head- from eye-centered RFs. Results indicate that both humans and monkeys use a mixture of the two RFs, suggesting that the neural mechanisms involved in ventriloquism occur in brain region(s) using a hybrid RF for encoding spatial information.
Project description:<h4>Background</h4>A prevailing view is that audiovisual integration requires temporally coincident signals. However, a recent study failed to find any evidence for audiovisual integration in visual search even when using synchronized audiovisual events. An important question is what information is critical to observe audiovisual integration.<h4>Methodology/principal findings</h4>Here we demonstrate that temporal coincidence (i.e., synchrony) of auditory and visual components can trigger audiovisual interaction in cluttered displays and consequently produce very fast and efficient target identification. In visual search experiments, subjects found a modulating visual target vastly more efficiently when it was paired with a synchronous auditory signal. By manipulating the kind of temporal modulation (sine wave vs. square wave vs. difference wave; harmonic sine-wave synthesis; gradient of onset/offset ramps) we show that abrupt visual events are required for this search efficiency to occur, and that sinusoidal audiovisual modulations do not support efficient search.<h4>Conclusions/significance</h4>Thus, audiovisual temporal alignment will only lead to benefits in visual search if the changes in the component signals are both synchronized and transient. We propose that transient signals are necessary in synchrony-driven binding to avoid spurious interactions with unrelated signals when these occur close together in time.
Project description:Although emotional audiovisual integration has been investigated previously, whether emotional audiovisual integration is affected by the spatial allocation of visual attention is currently unknown. To examine this question, a variant of the exogenous spatial cueing paradigm was adopted, in which stimuli varying by facial expressions and nonverbal affective prosody were used to express six basic emotions (happiness, anger, disgust, sadness, fear, surprise) via a visual, an auditory, or an audiovisual modality. The emotional stimuli were preceded by an unpredictive cue that was used to attract participants' visual attention. The results showed significantly higher accuracy and quicker response times in response to bimodal audiovisual stimuli than to unimodal visual or auditory stimuli for emotional perception under both valid and invalid cue conditions. The auditory facilitation effect was stronger than the visual facilitation effect under exogenous attention for the six emotions tested. Larger auditory enhancement was induced when the target was presented at the expected location than at the unexpected location. For emotional perception, happiness shared the biggest auditory enhancement among all six emotions. However, the influence of exogenous cueing effect on emotional perception seemed to be absent.