Window of audio-visual simultaneity is unaffected by spatio-temporal visual clutter.
Ontology highlight
ABSTRACT: In the present study we investigate the rules governing the perception of audiovisual synchrony within spatio-temporally cluttered visual environments. Participants viewed a ring of 19 discs modulating in luminance while hearing an amplitude modulating tone. Each disc modulated with a unique temporal phase (40 ms intervals), with only one synchronized to the tone. Participants searched for the synchronised disc whose spatial location varied randomly across trials. Square-wave modulation facilitated search: the synchronized disc was frequently chosen, with tight response distributions centred near zero-phase lag. In the sinusoidal condition responses were equally distributed over the 19 discs regardless of phase. To investigate whether subjective synchrony in the square-wave condition was limited by spatial or temporal factors we repeated the experiment with either reduced spatial density (9 discs) or temporal density (80 ms phase intervals). Reduced temporal density greatly facilitated synchrony perception but left the synchrony bandwidth unchanged, while no influence of spatial density was found. We conclude that audio-visual synchrony is not strongly constrained by the spatial or temporal density of the visual display, but by a temporal window within which audio-visual events are perceived as synchronous, with a full bandwidth of ~185 ms.
Project description:Audio-visual integration relies on temporal synchrony between visual and auditory inputs. However, differences in traveling and transmitting speeds between visual and auditory stimuli exist; therefore, audio-visual synchrony perception exhibits flexible functions. The processing speed of visual stimuli affects the perception of audio-visual synchrony. The present study examined the effects of visual fields, in which visual stimuli are presented, for the processing of audio-visual temporal synchrony. The point of subjective simultaneity, the temporal binding window, and the rapid recalibration effect were measured using temporal order judgment, simultaneity judgment, and stream/bounce perception, because different mechanisms of temporal processing have been suggested among these three paradigms. The results indicate that auditory stimuli should be presented earlier for visual stimuli in the central visual field than in the peripheral visual field condition in order to perceive subjective simultaneity in the temporal order judgment task conducted in this study. Meanwhile, the subjective simultaneity bandwidth was broader in the central visual field than in the peripheral visual field during the simultaneity judgment task. In the stream/bounce perception task, neither the point of subjective simultaneity nor the temporal binding window differed between the two types of visual fields. Moreover, rapid recalibration occurred in both visual fields during the simultaneity judgment tasks. However, during the temporal order judgment task and stream/bounce perception, rapid recalibration occurred only in the central visual field. These results suggest that differences in visual processing speed based on the visual field modulate the temporal processing of audio-visual stimuli. Furthermore, these three tasks, temporal order judgment, simultaneity judgment, and stream/bounce perception, each have distinct functional characteristics for audio-visual synchrony perception. Future studies are necessary to confirm the effects of compensation regarding differences in the temporal resolution of the visual field in later cortical visual pathways on visual field differences in audio-visual temporal synchrony.
Project description:Adults combine information from different sensory modalities to estimate object properties such as size or location. This process is optimal in that (i) sensory information is weighted according to relative reliability: more reliable estimates have more influence on the combined estimate and (ii) the combined estimate is more reliable than the component uni-modal estimates. Previous studies suggest that optimal sensory integration does not emerge until around 10 years of age. Younger children rely on a single modality or combine information using inappropriate sensory weights. Children aged 4-11 and adults completed a simple audio-visual task in which they reported either the number of beeps or the number of flashes in uni-modal and bi-modal conditions. In bi-modal trials, beeps and flashes differed in number by 0, 1 or 2. Mutual interactions between the sensory signals were evident at all ages: the reported number of flashes was influenced by the number of simultaneously presented beeps and vice versa. Furthermore, for all ages, the relative strength of these interactions was predicted by the relative reliabilities of the two modalities, in other words, all observers weighted the signals appropriately. The degree of cross-modal interaction decreased with age: the youngest observers could not ignore the task-irrelevant modality-they fully combined vision and audition such that they perceived equal numbers of flashes and beeps for bi-modal stimuli. Older observers showed much smaller effects of the task-irrelevant modality. Do these interactions reflect optimal integration? Full or partial cross-modal integration predicts improved reliability in bi-modal conditions. In contrast, switching between modalities reduces reliability. Model comparison suggests that older observers employed partial integration, whereas younger observers (up to around 8 years) did not integrate, but followed a sub-optimal switching strategy, responding according to either visual or auditory information on each trial.
Project description:Temporal recalibration of cross-modal synchrony has been proposed as a mechanism to compensate for timing differences between sensory modalities. However, far from the rich complexity of everyday life sensory environments, most studies to date have examined recalibration on isolated cross-modal pairings. Here, we hypothesize that selective attention might provide an effective filter to help resolve which stimuli are selected when multiple events compete for recalibration. We addressed this question by testing audio-visual recalibration following an adaptation phase where two opposing audio-visual asynchronies were present. The direction of voluntary visual attention, and therefore to one of the two possible asynchronies (flash leading or flash lagging), was manipulated using colour as a selection criterion. We found a shift in the point of subjective audio-visual simultaneity as a function of whether the observer had focused attention to audio-then-flash or to flash-then-audio groupings during the adaptation phase. A baseline adaptation condition revealed that this effect of endogenous attention was only effective toward the lagging flash. This hints at the role of exogenous capture and/or additional endogenous effects producing an asymmetry toward the leading flash. We conclude that selective attention helps promote selected audio-visual pairings to be combined and subsequently adjusted in time but, stimulus organization exerts a strong impact on recalibration. We tentatively hypothesize that the resolution of recalibration in complex scenarios involves the orchestration of top-down selection mechanisms and stimulus-driven processes.
Project description:Purpose:Aging affects a variety of visual functions. In this study, we aim to quantitatively investigate the temporal characteristics of visual processing in aging. Methods:Twelve younger (24.1 ± 1.6 years) and 12 older observers (58.4 ± 3.6 years) participated in the study. All participants had normal or corrected-to-normal vision. The contrast thresholds of the participants were measured using an orientation discrimination task with white external noise masks. The target-mask stimulus onset asynchronies were 16.7 ms, 33.4 ms, 50.0 ms, 83.4 ms, and ∞ (no external noise masks) in separate conditions. The signal stimulus was carefully chosen such that it was equally visible for the younger and older participants. An elaborated perceptual template model (ePTM) was fit to the data of each participant. Results:Without masks, there was no difference in contrast thresholds between the younger and older groups (P = 0.707). With masks, contrast thresholds in the older group elevated more than those in the younger group, and the pattern of threshold elevation differed in the two groups. The ePTM fitted the data well, with the older observers having lower template gains than the younger observers (P = 3.58 × 10-6). A further analysis of the weight parameters of the temporal window revealed that the older observers had a flatter temporal window than the younger observers (P = 0.025). Conclusions:Age-related temporal processing deficits were found in older observers with normal contrast sensitivity to the signal stimuli. The deficits were accounted for by the inferior temporal processing window of the visual system in aging.
Project description:Bayesian models propose that multisensory integration depends on both sensory evidence (the likelihood) and priors indicating whether or not two inputs belong to the same event. The present study manipulated the prior for dynamic auditory and visual stimuli to co-occur and tested the predicted enhancement of multisensory binding as assessed with a simultaneity judgment task. In an initial learning phase participants were exposed to a subset of auditory-visual combinations. In the test phase the previously encountered audio-visual stimuli were presented together with new combinations of the auditory and visual stimuli from the learning phase, audio-visual stimuli containing one learned and one new sensory component, and audio-visual stimuli containing completely new auditory and visual material. Auditory-visual asynchrony was manipulated. A higher proportion of simultaneity judgements was observed for the learned cross-modal combinations than for new combinations of the same auditory and visual elements, as well as for all other conditions. This result suggests that prior exposure to certain auditory-visual combinations changed the expectation (i.e., the prior) that their elements belonged to the same event. As a result, multisensory binding became more likely despite unchanged sensory evidence of the auditory and visual elements.
Project description:People can discriminate the synchrony between audio-visual scenes. However, the sensitivity of audio-visual synchrony perception can be affected by many factors. Using a simultaneity judgment task, the present study investigated whether the synchrony perception of complex audio-visual stimuli was affected by audio-visual causality and stimulus reliability. In Experiment 1, the results showed that audio-visual causality could increase one's sensitivity to audio-visual onset asynchrony (AVOA) of both action stimuli and speech stimuli. Moreover, participants were more tolerant of AVOA of speech stimuli than that of action stimuli in the high causality condition, whereas no significant difference between these two kinds of stimuli was found in the low causality condition. In Experiment 2, the speech stimuli were manipulated with either high or low stimulus reliability. The results revealed a significant interaction between audio-visual causality and stimulus reliability. Under the low causality condition, the percentage of "synchronous" responses of audio-visual intact stimuli was significantly higher than that of visual_intact/auditory_blurred stimuli and audio-visual blurred stimuli. In contrast, no significant difference among all levels of stimulus reliability was observed under the high causality condition. Our study supported the synergistic effect of top-down processing and bottom-up processing in audio-visual synchrony perception.
Project description:Perception in multi-sensory environments involves both grouping and segregation of events across sensory modalities. Temporal coincidence between events is considered a strong cue to resolve multisensory perception. However, differences in physical transmission and neural processing times amongst modalities complicate this picture. This is illustrated by cross-modal recalibration, whereby adaptation to audio-visual asynchrony produces shifts in perceived simultaneity. Here, we examined whether voluntary actions might serve as a temporal anchor to cross-modal recalibration in time. Participants were tested on an audio-visual simultaneity judgment task after an adaptation phase where they had to synchronize voluntary actions with audio-visual pairs presented at a fixed asynchrony (vision leading or vision lagging). Our analysis focused on the magnitude of cross-modal recalibration to the adapted audio-visual asynchrony as a function of the nature of the actions during adaptation, putatively fostering cross-modal grouping or, segregation. We found larger temporal adjustments when actions promoted grouping than segregation of sensory events. However, a control experiment suggested that additional factors, such as attention to planning/execution of actions, could have an impact on recalibration effects. Contrary to the view that cross-modal temporal organization is mainly driven by external factors related to the stimulus or environment, our findings add supporting evidence for the idea that perceptual adjustments strongly depend on the observer's inner states induced by motor and cognitive demands.
Project description:Statistical dependencies in the responses of sensory neurons govern both the amount of stimulus information conveyed and the means by which downstream neurons can extract it. Although a variety of measurements indicate the existence of such dependencies, their origin and importance for neural coding are poorly understood. Here we analyse the functional significance of correlated firing in a complete population of macaque parasol retinal ganglion cells using a model of multi-neuron spike responses. The model, with parameters fit directly to physiological data, simultaneously captures both the stimulus dependence and detailed spatio-temporal correlations in population responses, and provides two insights into the structure of the neural code. First, neural encoding at the population level is less noisy than one would expect from the variability of individual neurons: spike times are more precise, and can be predicted more accurately when the spiking of neighbouring neurons is taken into account. Second, correlations provide additional sensory information: optimal, model-based decoding that exploits the response correlation structure extracts 20% more information about the visual scene than decoding under the assumption of independence, and preserves 40% more visual information than optimal linear decoding. This model-based approach reveals the role of correlated activity in the retinal coding of visual stimuli, and provides a general framework for understanding the importance of correlated activity in populations of neurons.
Project description:Music involves different senses and is emotional in nature, and musicians show enhanced detection of audio-visual temporal discrepancies and emotion recognition compared to non-musicians. However, whether musical training produces these enhanced abilities or if they are innate within musicians remains unclear. Thirty-one adult participants were randomly assigned to a music training, music listening, or control group who all completed a one-hour session per week for 11 weeks. The music training group received piano training, the music listening group listened to the same music, and the control group did their homework. Measures of audio-visual temporal discrepancy, facial expression recognition, autistic traits, depression, anxiety, stress and mood were completed and compared from the beginning to end of training. ANOVA results revealed that only the music training group showed a significant improvement in detection of audio-visual temporal discrepancies compared to the other groups for both stimuli (flash-beep and face-voice). However, music training did not improve emotion recognition from facial expressions compared to the control group, while it did reduce the levels of depression, stress and anxiety compared to baseline. This RCT study provides the first evidence of a causal effect of music training on improved audio-visual perception that goes beyond the music domain.