Auditory scene analysis: the sweet music of ambiguity.
ABSTRACT: In this review paper aimed at the non-specialist, we explore the use that neuroscientists and musicians have made of perceptual illusions based on ambiguity. The pivotal issue is auditory scene analysis (ASA), or what enables us to make sense of complex acoustic mixtures in order to follow, for instance, a single melody in the midst of an orchestra. In general, ASA uncovers the most likely physical causes that account for the waveform collected at the ears. However, the acoustical problem is ill-posed and it must be solved from noisy sensory input. Recently, the neural mechanisms implicated in the transformation of ambiguous sensory information into coherent auditory scenes have been investigated using so-called bistability illusions (where an unchanging ambiguous stimulus evokes a succession of distinct percepts in the mind of the listener). After reviewing some of those studies, we turn to music, which arguably provides some of the most complex acoustic scenes that a human listener will ever encounter. Interestingly, musicians will not always aim at making each physical source intelligible, but rather express one or more melodic lines with a small or large number of instruments. By means of a few musical illustrations and by using a computational model inspired by neuro-physiological principles, we suggest that this relies on a detailed (if perhaps implicit) knowledge of the rules of ASA and of its inherent ambiguity. We then put forward the opinion that some degree perceptual ambiguity may participate in our appreciation of music.
Project description:Because musicians are trained to discern sounds within complex acoustic scenes, such as an orchestra playing, it has been hypothesized that musicianship improves general auditory scene analysis abilities. Here, we compared musicians and non-musicians in a behavioural paradigm using ambiguous stimuli, combining performance, reaction times and confidence measures. We used 'Shepard tones', for which listeners may report either an upward or a downward pitch shift for the same ambiguous tone pair. Musicians and non-musicians performed similarly on the pitch-shift direction task. In particular, both groups were at chance for the ambiguous case. However, groups differed in their reaction times and judgements of confidence. Musicians responded to the ambiguous case with long reaction times and low confidence, whereas non-musicians responded with fast reaction times and maximal confidence. In a subsequent experiment, non-musicians displayed reduced confidence for the ambiguous case when pure-tone components of the Shepard complex were made easier to discern. The results suggest an effect of musical training on scene analysis: we speculate that musicians were more likely to discern components within complex auditory scenes, perhaps because of enhanced attentional resolution, and thus discovered the ambiguity. For untrained listeners, stimulus ambiguity was not available to perceptual awareness.This article is part of the themed issue 'Auditory and visual scene analysis'.
Project description:Lightness illusions are fundamental to human perception, and yet why we see them is still the focus of much research. Here we address the question by modelling not human physiology or perception directly as is typically the case but our natural visual world and the need for robust behaviour. Artificial neural networks were trained to predict the reflectance of surfaces in a synthetic ecology consisting of 3-D "dead-leaves" scenes under non-uniform illumination. The networks learned to solve this task accurately and robustly given only ambiguous sense data. In addition--and as a direct consequence of their experience--the networks also made systematic "errors" in their behaviour commensurate with human illusions, which includes brightness contrast and assimilation--although assimilation (specifically White's illusion) only emerged when the virtual ecology included 3-D, as opposed to 2-D scenes. Subtle variations in these illusions, also found in human perception, were observed, such as the asymmetry of brightness contrast. These data suggest that "illusions" arise in humans because (i) natural stimuli are ambiguous, and (ii) this ambiguity is resolved empirically by encoding the statistical relationship between images and scenes in past visual experience. Since resolving stimulus ambiguity is a challenge faced by all visual systems, a corollary of these findings is that human illusions must be experienced by all visual animals regardless of their particular neural machinery. The data also provide a more formal definition of illusion: the condition in which the true source of a stimulus differs from what is its most likely (and thus perceived) source. As such, illusions are not fundamentally different from non-illusory percepts, all being direct manifestations of the statistical relationship between images and scenes.
Project description:Behavioral and neurophysiological transfer effects from music experience to language processing are well-established but it is currently unclear whether or not linguistic expertise (e.g., speaking a tone language) benefits music-related processing and its perception. Here, we compare brainstem responses of English-speaking musicians/non-musicians and native speakers of Mandarin Chinese elicited by tuned and detuned musical chords, to determine if enhancements in subcortical processing translate to improvements in the perceptual discrimination of musical pitch. Relative to non-musicians, both musicians and Chinese had stronger brainstem representation of the defining pitches of musical sequences. In contrast, two behavioral pitch discrimination tasks revealed that neither Chinese nor non-musicians were able to discriminate subtle changes in musical pitch with the same accuracy as musicians. Pooled across all listeners, brainstem magnitudes predicted behavioral pitch discrimination performance but considering each group individually, only musicians showed connections between neural and behavioral measures. No brain-behavior correlations were found for tone language speakers or non-musicians. These findings point to a dissociation between subcortical neurophysiological processing and behavioral measures of pitch perception in Chinese listeners. We infer that sensory-level enhancement of musical pitch information yields cognitive-level perceptual benefits only when that information is behaviorally relevant to the listener.
Project description:A recent study showed that when a sound mixture has ambiguous spectrotemporal structure, spatial cues alone are sufficient to change the balance of grouping cues and affect the perceptual organization of the auditory scene. The current study synthesizes similar stimuli in a reverberant setting to see whether the interaural decorrelation caused by reverberant energy reduces the influence of spatial cues on perceptual organization. Results suggest that reverberant spatial cues are less influential on perceptual segregation than anechoic spatial cues. In addition, results replicate an interesting finding from the earlier study, where an ambiguous tone that could logically belong to either a repeating tone sequence or a simultaneous harmonic complex can sometimes "disappear" and never be heard as part of the perceptual foreground, no matter which object a listener attends. As in the previous study, the perceived energy of the ambiguous element does not "trade" between the objects in a complex scene (i.e., the element does not necessarily contribute more to one object when it contributes less to a competing object). Results are consistent with the idea that the perceptual organization of an acoustic mixture depends on what object a listener attends.
Project description:Studies suggest that long-term music experience enhances the brain's ability to segregate speech from noise. Musicians' "speech-in-noise (SIN) benefit" is based largely on perception from simple figure-ground tasks rather than competitive, multi-talker scenarios that offer realistic spatial cues for segregation and engage binaural processing. We aimed to investigate whether musicians show perceptual advantages in cocktail party speech segregation in a competitive, multi-talker environment. We used the coordinate response measure (CRM) paradigm to measure speech recognition and localization performance in musicians vs. non-musicians in a simulated 3D cocktail party environment conducted in an anechoic chamber. Speech was delivered through a 16-channel speaker array distributed around the horizontal soundfield surrounding the listener. Participants recalled the color, number, and perceived location of target callsign sentences. We manipulated task difficulty by varying the number of additional maskers presented at other spatial locations in the horizontal soundfield (0-1-2-3-4-6-8 multi-talkers). Musicians obtained faster and better speech recognition amidst up to around eight simultaneous talkers and showed less noise-related decline in performance with increasing interferers than their non-musician peers. Correlations revealed associations between listeners' years of musical training and CRM recognition and working memory. However, better working memory correlated with better speech streaming. Basic (QuickSIN) but not more complex (speech streaming) SIN processing was still predicted by music training after controlling for working memory. Our findings confirm a relationship between musicianship and naturalistic cocktail party speech streaming but also suggest that cognitive factors at least partially drive musicians' SIN advantage.
Project description:Pattern recognition on neural activations from naturalistic music listening has been successful at predicting neural responses of listeners from musical features, and vice versa. Inter-subject differences in the decoding accuracies have arisen partly from musical training that has widely recognized structural and functional effects on the brain. We propose and evaluate a decoding approach aimed at predicting the musicianship class of an individual listener from dynamic neural processing of musical features. Whole brain functional magnetic resonance imaging (fMRI) data was acquired from musicians and nonmusicians during listening of three musical pieces from different genres. Six musical features, representing low-level (timbre) and high-level (rhythm and tonality) aspects of music perception, were computed from the acoustic signals, and classification into musicians and nonmusicians was performed on the musical feature and parcellated fMRI time series. Cross-validated classification accuracy reached 77% with nine regions, comprising frontal and temporal cortical regions, caudate nucleus, and cingulate gyrus. The processing of high-level musical features at right superior temporal gyrus was most influenced by listeners' musical training. The study demonstrates the feasibility to decode musicianship from how individual brains listen to music, attaining accuracy comparable to current results from automated clinical diagnosis of neurological and psychological disorders.
Project description:Perceptual and neurophysiological enhancements in linguistic processing in musicians suggest that domain specific experience may enhance neural resources recruited for language specific behaviors. In everyday situations, listeners are faced with extracting speech signals in degraded listening conditions. Here, we examine whether musical training provides resilience to the degradative effects of reverberation on subcortical representations of pitch and formant-related harmonic information of speech. Brainstem frequency-following responses (FFRs) were recorded from musicians and non-musician controls in response to the vowel /i/ in four different levels of reverberation and analyzed based on their spectro-temporal composition. For both groups, reverberation had little effect on the neural encoding of pitch but significantly degraded neural encoding of formant-related harmonics (i.e., vowel quality) suggesting a differential impact on the source-filter components of speech. However, in quiet and across nearly all reverberation conditions, musicians showed more robust responses than non-musicians. Neurophysiologic results were confirmed behaviorally by comparing brainstem spectral magnitudes with perceptual measures of fundamental (F0) and first formant (F1) frequency difference limens (DLs). For both types of discrimination, musicians obtained DLs which were 2-4 times better than non-musicians. Results suggest that musicians' enhanced neural encoding of acoustic features, an experience-dependent effect, is more resistant to reverberation degradation which may explain their enhanced perceptual ability on behaviorally relevant speech and/or music tasks in adverse listening conditions.
Project description:Long-term music training has been shown to affect different cognitive and perceptual abilities. However, it is less well known whether it can also affect the perception of emotion from music, especially purely rhythmic music. Hence, we asked a group of 16 non-musicians, 16 musicians with no drumming experience, and 16 drummers to judge the level of expressiveness, the valence (positive and negative), and the category of emotion perceived from 96 drumming improvisation clips (audio-only, video-only, and audiovideo) that varied in several music features (e.g., musical genre, tempo, complexity, drummer's expressiveness, and drummer's style). Our results show that the level and type of music training influence the perceived expressiveness, valence, and emotion from solo drumming improvisation. Overall, non-musicians, non-drummer musicians, and drummers were affected differently by changes in some characteristics of the music performance, for example musicians (with and without drumming experience) gave a greater weight to the visual performance than non-musicians when giving their emotional judgments. These findings suggest that besides influencing several cognitive and perceptual abilities, music training also affects how we perceive emotion from music.
Project description:Music exhibits structure at multiple scales, ranging from motifs to large-scale functional components. When inferring the structure of a piece, different listeners may attend to different temporal scales, which can result in disagreements when they describe the same piece. In the field of music informatics research (MIR), it is common to use corpora annotated with structural boundaries at different levels. By quantifying disagreements between multiple annotators, previous research has yielded several insights relevant to the study of music cognition. First, annotators tend to agree when structural boundaries are ambiguous. Second, this ambiguity seems to depend on musical features, time scale, and genre. Furthermore, it is possible to tune current annotation evaluation metrics to better align with these perceptual differences. However, previous work has not directly analyzed the effects of hierarchical structure because the existing methods for comparing structural annotations are designed for "flat" descriptions, and do not readily generalize to hierarchical annotations. In this paper, we extend and generalize previous work on the evaluation of hierarchical descriptions of musical structure. We derive an evaluation metric which can compare hierarchical annotations holistically across multiple levels. sing this metric, we investigate inter-annotator agreement on the multilevel annotations of two different music corpora, investigate the influence of acoustic properties on hierarchical annotations, and evaluate existing hierarchical segmentation algorithms against the distribution of inter-annotator agreement.
Project description:The dataset was used to study the effect of 2 hours of western classical music concert performance on the peripheral blood microRNA transcriptome in professional musicians. Overall design: MicroRNAs were extracted from the peripheral blood of the professional musicians (N=10) immediately before and after 2 hours of western classical music concert performance and sequenced using Illumina HiSeq. Differential expression analysis was then performed to compare the microRNA expressions to assess the effect of 2 hours of western classical music-performance on the peripheral blood microRNA transcriptome in professional musicians.