Auditory-motor expertise alters "speech selectivity" in professional musicians and actors.
ABSTRACT: Several perisylvian brain regions show preferential activation for spoken language above and beyond other complex sounds. These "speech-selective" effects might be driven by regions' intrinsic biases for processing the acoustical or informational properties of speech. Alternatively, such speech selectivity might emerge through extensive experience in perceiving and producing speech sounds. This functional magnetic resonance imaging (fMRI) study disambiguated such audiomotor expertise from speech selectivity by comparing activation for listening to speech and music in female professional violinists and actors. Audiomotor expertise effects were identified in several right and left superior temporal regions that responded to speech in all participants and music in violinists more than actresses. Regions associated with the acoustic/information content of speech were identified along the entire length of the superior temporal sulci bilaterally where activation was greater for speech than music in all participants. Finally, an effect of performing arts training was identified in bilateral premotor regions commonly activated by finger and mouth movements as well as in right hemisphere "language regions." These results distinguish the seemingly speech-specific neural responses that can be abolished and even reversed by long-term audiomotor experience.
Project description:The organization of human auditory cortex remains unresolved, due in part to the small stimulus sets common to fMRI studies and the overlap of neural populations within voxels. To address these challenges, we measured fMRI responses to 165 natural sounds and inferred canonical response profiles ("components") whose weighted combinations explained voxel responses throughout auditory cortex. This analysis revealed six components, each with interpretable response characteristics despite being unconstrained by prior functional hypotheses. Four components embodied selectivity for particular acoustic features (frequency, spectrotemporal modulation, pitch). Two others exhibited pronounced selectivity for music and speech, respectively, and were not explainable by standard acoustic features. Anatomically, music and speech selectivity concentrated in distinct regions of non-primary auditory cortex. However, music selectivity was weak in raw voxel responses, and its detection required a decomposition method. Voxel decomposition identifies primary dimensions of response variation across natural sounds, revealing distinct cortical pathways for music and speech.
Project description:This article concerns sound aesthetic preferences for European foreign languages. We investigated the phonetic-acoustic dimension of the linguistic aesthetic pleasure to describe the "music" found in European languages. The Romance languages, French, Italian, and Spanish, take a lead when people talk about melodious language - the music-like effects in the language (a.k.a., phonetic chill). On the other end of the melodiousness spectrum are German and Arabic that are often considered sounding harsh and un-attractive. Despite the public interest, limited research has been conducted on the topic of phonaesthetics, i.e., the subfield of phonetics that is concerned with the aesthetic properties of speech sounds (Crystal, 2008). Our goal is to fill the existing research gap by identifying the acoustic features that drive the auditory perception of language sound beauty. What is so music-like in the language that makes people say "it is music in my ears"? We had 45 central European participants listening to 16 auditorily presented European languages and rating each language in terms of 22 binary characteristics (e.g., beautiful - ugly and funny - boring) plus indicating their language familiarities, L2 backgrounds, speaker voice liking, demographics, and musicality levels. Findings revealed that all factors in complex interplay explain a certain percentage of variance: familiarity and expertise in foreign languages, speaker voice characteristics, phonetic complexity, musical acoustic properties, and finally musical expertise of the listener. The most important discovery was the trade-off between speech tempo and so-called linguistic melody (pitch variance): the faster the language, the flatter/more atonal it is in terms of the pitch (speech melody), making it highly appealing acoustically (sounding beautiful and sexy), but not so melodious in a "musical" sense.
Project description:A fundamental question regarding music processing is its degree of independence from speech processing, in terms of their underlying neuroanatomy and influence of cognitive traits and abilities. Although a straight answer to that question is still lacking, a large number of studies have described where in the brain and in which contexts (tasks, stimuli, populations) this independence is, or is not, observed. We examined the independence between music and speech processing using functional magnetic resonance imagining and a stimulation paradigm with different human vocal sounds produced by the same voice. The stimuli were grouped as Speech (spoken sentences), Hum (hummed melodies), and Song (sung sentences); the sentences used in Speech and Song categories were the same, as well as the melodies used in the two musical categories. Each category had a scrambled counterpart which allowed us to render speech and melody unintelligible, while preserving global amplitude and frequency characteristics. Finally, we included a group of musicians to evaluate the influence of musical expertise. Similar global patterns of cortical activity were related to all sound categories compared to baseline, but important differences were evident. Regions more sensitive to musical sounds were located bilaterally in the anterior and posterior superior temporal gyrus (planum polare and temporale), the right supplementary and premotor areas, and the inferior frontal gyrus. However, only temporal areas and supplementary motor cortex remained music-selective after subtracting brain activity related to the scrambled stimuli. Speech-selective regions mainly affected by intelligibility of the stimuli were observed on the left pars opecularis and the anterior portion of the medial temporal gyrus. We did not find differences between musicians and non-musicians Our results confirmed music-selective cortical regions in associative cortices, independent of previous musical training.
Project description:Regions of the human temporal lobe show greater activation for speech than for other sounds. These differences may reflect intrinsically specialized domain-specific adaptations for processing speech, or they may be driven by the significant expertise we have in listening to the speech signal. To test the expertise hypothesis, we used a video-game-based paradigm that tacitly trained listeners to categorize acoustically complex, artificial nonlinguistic sounds. Before and after training, we used functional MRI to measure how expertise with these sounds modulated temporal lobe activation. Participants' ability to explicitly categorize the nonspeech sounds predicted the change in pretraining to posttraining activation in speech-sensitive regions of the left posterior superior temporal sulcus, suggesting that emergent auditory expertise may help drive this functional regionalization. Thus, seemingly domain-specific patterns of neural activation in higher cortical regions may be driven in part by experience-based restructuring of high-dimensional perceptual space.
Project description:Rhythm in music and speech can be characterized by a constellation of several acoustic cues. Individually, these cues have different effects on rhythmic perception: sequences of sounds alternating in duration are perceived as short-long pairs (weak-strong/iambic pattern), whereas sequences of sounds alternating in intensity or pitch are perceived as loud-soft, or high-low pairs (strong-weak/trochaic pattern). This perceptual bias-called the Iambic-Trochaic Law (ITL)-has been claimed to be an universal property of the auditory system applying in both the music and the language domains. Recent studies have shown that language experience can modulate the effects of the ITL on rhythmic perception of both speech and non-speech sequences in adults, and of non-speech sequences in 7.5-month-old infants. The goal of the present study was to explore whether language experience also modulates infants' grouping of speech. To do so, we presented sequences of syllables to monolingual French- and German-learning 7.5-month-olds. Using the Headturn Preference Procedure (HPP), we examined whether they were able to perceive a rhythmic structure in sequences of syllables that alternated in duration, pitch, or intensity. Our findings show that both French- and German-learning infants perceived a rhythmic structure when it was cued by duration or pitch but not intensity. Our findings also show differences in how these infants use duration and pitch cues to group syllable sequences, suggesting that pitch cues were the easier ones to use. Moreover, performance did not differ across languages, failing to reveal early language effects on rhythmic perception. These results contribute to our understanding of the origin of rhythmic perception and perceptual mechanisms shared across music and speech, which may bootstrap language acquisition.
Project description:Task-irrelevant speech or music sounds are known to disrupt verbal short-term memory even when participants are instructed to ignore the sound, suggesting that automatically processed acoustical changes interfere with the rehearsal of phonological items. However, much less is known about auditory distraction in tasks that require the memorization and recall of non-phonological auditory items. In the present study, both musically trained and untrained participants were asked to memorize random tone sequences (consisting of low, medium, and high pitch tones) while task-irrelevant sound was presented. Irrelevant instrumental music was found to produce more disruption of tonal recall than white noise, whereas irrelevant speech produced intermediate levels of disruption. In contrast, only speech produced significant interference in an analogous verbal recall task. Crucially, although musically trained participants were able to recall more tones in general, the degree of auditory distraction that was produced by irrelevant music in the tonal recall task was found to be independent of musical expertise. The findings are in line with the assumption of two separate mechanisms for the maintenance of tonal and phonological information. Specifically, short-term memory for tone sequences may rely on a pitch-based rehearsal system which is disrupted by the perception of irrelevant pitch changes as contained in instrumental music (and to a lesser extent in speech), whereas serial recall of verbal items is most sensitive to phonological sounds.
Project description:Emotional responses to biologically significant events are essential for human survival. Do human emotions lawfully track changes in the acoustic environment? Here we report that changes in acoustic attributes that are well known to interact with human emotions in speech and music also trigger systematic emotional responses when they occur in environmental sounds, including sounds of human actions, animal calls, machinery, or natural phenomena, such as wind and rain. Three changes in acoustic attributes known to signal emotional states in speech and music were imposed upon 24 environmental sounds. Evaluations of stimuli indicated that human emotions track such changes in environmental sounds just as they do for speech and music. Such changes not only influenced evaluations of the sounds themselves, they also affected the way accompanying facial expressions were interpreted emotionally. The findings illustrate that human emotions are highly attuned to changes in the acoustic environment, and reignite a discussion of Charles Darwin's hypothesis that speech and music originated from a common emotional signal system based on the imitation and modification of environmental sounds.
Project description:This tutorial provides a comprehensive overview of the methodological approach to collecting and analyzing auditory brain stem responses to complex sounds (cABRs). cABRs provide a window into how behaviorally relevant sounds such as speech and music are processed in the brain. Because temporal and spectral characteristics of sounds are preserved in this subcortical response, cABRs can be used to assess specific impairments and enhancements in auditory processing. Notably, subcortical auditory function is neither passive nor hardwired but dynamically interacts with higher-level cognitive processes to refine how sounds are transcribed into neural code. This experience-dependent plasticity, which can occur on a number of time scales (e.g., life-long experience with speech or music, short-term auditory training, on-line auditory processing), helps shape sensory perception. Thus, by being an objective and noninvasive means for examining cognitive function and experience-dependent processes in sensory activity, cABRs have considerable utility in the study of populations where auditory function is of interest (e.g., auditory experts such as musicians, and persons with hearing loss, auditory processing, and language disorders). This tutorial is intended for clinicians and researchers seeking to integrate cABRs into their clinical or research programs.
Project description:Auditory and visual signals generated by a single source tend to be temporally correlated, such as the synchronous sounds of footsteps and the limb movements of a walker. Continuous tracking and comparison of the dynamics of auditory-visual streams is thus useful for the perceptual binding of information arising from a common source. Although language-related mechanisms have been implicated in the tracking of speech-related auditory-visual signals (e.g., speech sounds and lip movements), it is not well known what sensory mechanisms generally track ongoing auditory-visual synchrony for non-speech signals in a complex auditory-visual environment. To begin to address this question, we used music and visual displays that varied in the dynamics of multiple features (e.g., auditory loudness and pitch; visual luminance, color, size, motion, and organization) across multiple time scales. Auditory activity (monitored using auditory steady-state responses, ASSR) was selectively reduced in the left hemisphere when the music and dynamic visual displays were temporally misaligned. Importantly, ASSR was not affected when attentional engagement with the music was reduced, or when visual displays presented dynamics clearly dissimilar to the music. These results appear to suggest that left-lateralized auditory mechanisms are sensitive to auditory-visual temporal alignment, but perhaps only when the dynamics of auditory and visual streams are similar. These mechanisms may contribute to correct auditory-visual binding in a busy sensory environment.
Project description:The strong association between music and speech has been supported by recent research focusing on musicians' superior abilities in second language learning and neural encoding of foreign speech sounds. However, evidence for a double association--the influence of linguistic background on music pitch processing and disorders--remains elusive. Because languages differ in their usage of elements (e.g., pitch) that are also essential for music, a unique opportunity for examining such language-to-music associations comes from a cross-cultural (linguistic) comparison of congenital amusia, a neurogenetic disorder affecting the music (pitch and rhythm) processing of about 5% of the Western population. In the present study, two populations (Hong Kong and Canada) were compared. One spoke a tone language in which differences in voice pitch correspond to differences in word meaning (in Hong Kong Cantonese, /si/ means 'teacher' and 'to try' when spoken in a high and mid pitch pattern, respectively). Using the On-line Identification Test of Congenital Amusia, we found Cantonese speakers as a group tend to show enhanced pitch perception ability compared to speakers of Canadian French and English (non-tone languages). This enhanced ability occurs in the absence of differences in rhythmic perception and persists even after relevant factors such as musical background and age were controlled. Following a common definition of amusia (5% of the population), we found Hong Kong pitch amusics also show enhanced pitch abilities relative to their Canadian counterparts. These findings not only provide critical evidence for a double association of music and speech, but also argue for the reconceptualization of communicative disorders within a cultural framework. Along with recent studies documenting cultural differences in visual perception, our auditory evidence challenges the common assumption of universality of basic mental processes and speaks to the domain generality of culture-to-perception influences.