Attentional gain control of ongoing cortical speech representations in a "cocktail party".
ABSTRACT: Normal listeners possess the remarkable perceptual ability to select a single speech stream among many competing talkers. However, few studies of selective attention have addressed the unique nature of speech as a temporally extended and complex auditory object. We hypothesized that sustained selective attention to speech in a multitalker environment would act as gain control on the early auditory cortical representations of speech. Using high-density electroencephalography and a template-matching analysis method, we found selective gain to the continuous speech content of an attended talker, greatest at a frequency of 4-8 Hz, in auditory cortex. In addition, the difference in alpha power (8-12 Hz) at parietal sites across hemispheres indicated the direction of auditory attention to speech, as has been previously found in visual tasks. The strength of this hemispheric alpha lateralization, in turn, predicted an individual's attentional gain of the cortical speech signal. These results support a model of spatial speech stream segregation, mediated by a supramodal attention mechanism, enabling selection of the attended representation in auditory cortex.
Project description:Impaired speech perception in noise despite normal peripheral auditory function is a common problem in young adults. Despite a growing body of research, the pathophysiology of this impairment remains unknown. This magnetoencephalography study characterizes the cortical tracking of speech in a multi-talker background in a group of highly selected adult subjects with impaired speech perception in noise without peripheral auditory dysfunction. Magnetoencephalographic signals were recorded from 13 subjects with impaired speech perception in noise (six females, mean age: 30 years) and matched healthy subjects while they were listening to 5 different recordings of stories merged with a multi-talker background at different signal to noise ratios (No Noise, +10, +5, 0 and -5 dB). The cortical tracking of speech was quantified with coherence between magnetoencephalographic signals and the temporal envelope of (i) the global auditory scene (i.e. the attended speech stream and the multi-talker background noise), (ii) the attended speech stream only and (iii) the multi-talker background noise. Functional connectivity was then estimated between brain areas showing altered cortical tracking of speech in noise in subjects with impaired speech perception in noise and the rest of the brain. All participants demonstrated a selective cortical representation of the attended speech stream in noisy conditions, but subjects with impaired speech perception in noise displayed reduced cortical tracking of speech at the syllable rate (i.e. 4-8 Hz) in all noisy conditions. Increased functional connectivity was observed in subjects with impaired speech perception in noise in Noiseless and speech in noise conditions between supratemporal auditory cortices and left-dominant brain areas involved in semantic and attention processes. The difficulty to understand speech in a multi-talker background in subjects with impaired speech perception in noise appears to be related to an inaccurate auditory cortex tracking of speech at the syllable rate. The increased functional connectivity between supratemporal auditory cortices and language/attention-related neocortical areas probably aims at supporting speech perception and subsequent recognition in adverse auditory scenes. Overall, this study argues for a central origin of impaired speech perception in noise in the absence of any peripheral auditory dysfunction.
Project description:When selectively attending to a speech stream in multi-talker scenarios, low-frequency cortical activity is known to synchronize selectively to fluctuations in the attended speech signal. Older listeners with age-related sensorineural hearing loss (presbycusis) often struggle to understand speech in such situations, even when wearing a hearing aid. Yet, it is unclear whether a peripheral hearing loss degrades the attentional modulation of cortical speech tracking. Here, we used psychoacoustics and electroencephalography (EEG) in male and female human listeners to examine potential effects of hearing loss on EEG correlates of speech envelope synchronization in cortex. Behaviorally, older hearing-impaired (HI) listeners showed degraded speech-in-noise recognition and reduced temporal acuity compared with age-matched normal-hearing (NH) controls. During EEG recordings, we used a selective attention task with two spatially separated simultaneous speech streams where NH and HI listeners both showed high speech recognition performance. Low-frequency (<10 Hz) envelope-entrained EEG responses were enhanced in the HI listeners, both for the attended speech, but also for tone sequences modulated at slow rates (4 Hz) during passive listening. Compared with the attended speech, responses to the ignored stream were found to be reduced in both HI and NH listeners, allowing for the attended target to be classified from single-trial EEG data with similar high accuracy in the two groups. However, despite robust attention-modulated speech entrainment, the HI listeners rated the competing speech task to be more difficult. These results suggest that speech-in-noise problems experienced by older HI listeners are not necessarily associated with degraded attentional selection.SIGNIFICANCE STATEMENT People with age-related sensorineural hearing loss often struggle to follow speech in the presence of competing talkers. It is currently unclear whether hearing impairment may impair the ability to use selective attention to suppress distracting speech in situations when the distractor is well segregated from the target. Here, we report amplified envelope-entrained cortical EEG responses to attended speech and to simple tones modulated at speech rates (4 Hz) in listeners with age-related hearing loss. Critically, despite increased self-reported listening difficulties, cortical synchronization to speech mixtures was robustly modulated by selective attention in listeners with hearing loss. This allowed the attended talker to be classified from single-trial EEG responses with high accuracy in both older hearing-impaired listeners and age-matched normal-hearing controls.
Project description:Human listeners can focus on one speech stream out of several concurrent ones. The present study aimed to assess the whole-brain functional networks underlying a) the process of focusing attention on a single speech stream vs. dividing attention between two streams and 2) speech processing on different time-scales and depth. Two spoken narratives were presented simultaneously while listeners were instructed to a) track and memorize the contents of a speech stream and b) detect the presence of numerals or syntactic violations in the same ("focused attended condition") or in the parallel stream ("divided attended condition"). Speech content tracking was found to be associated with stronger connectivity in lower frequency bands (delta band- 0,5-4 Hz), whereas the detection tasks were linked with networks operating in the faster alpha (8-10 Hz) and beta (13-30 Hz) bands. These results suggest that the oscillation frequencies of the dominant brain networks during speech processing may be related to the duration of the time window within which information is integrated. We also found that focusing attention on a single speaker compared to dividing attention between two concurrent speakers was predominantly associated with connections involving the frontal cortices in the delta (0.5-4 Hz), alpha (8-10 Hz), and beta bands (13-30 Hz), whereas dividing attention between two parallel speech streams was linked with stronger connectivity involving the parietal cortices in the delta and beta frequency bands. Overall, connections strengthened by focused attention may reflect control over information selection, whereas connections strengthened by divided attention may reflect the need for maintaining two streams in parallel and the related control processes necessary for performing the tasks.
Project description:Oscillatory alpha-band activity (8-15 Hz) over parieto-occipital cortex in humans plays an important role in suppression of processing for inputs at to-be-ignored regions of space, with increased alpha-band power observed over cortex contralateral to locations expected to contain distractors. It is unclear whether similar processes operate during deployment of spatial attention in other sensory modalities. Evidence from lesion patients suggests that parietal regions house supramodal representations of space. The parietal lobes are prominent generators of alpha oscillations, raising the possibility that alpha is a neural signature of supramodal spatial attention. Furthermore, when spatial attention is deployed within vision, processing of task-irrelevant auditory inputs at attended locations is also enhanced, pointing to automatic links between spatial deployments across senses. Here, we asked whether lateralized alpha-band activity is also evident in a purely auditory spatial-cueing task and whether it had the same underlying generator configuration as in a purely visuospatial task. If common to both sensory systems, this would provide strong support for "supramodal" attention theory. Alternately, alpha-band differences between auditory and visual tasks would support a sensory-specific account. Lateralized shifts in alpha-band activity were indeed observed during a purely auditory spatial task. Crucially, there were clear differences in scalp topographies of this alpha activity depending on the sensory system within which spatial attention was deployed. Findings suggest that parietally generated alpha-band mechanisms are central to attentional deployments across modalities but that they are invoked in a sensory-specific manner. The data support an "interactivity account," whereby a supramodal system interacts with sensory-specific control systems during deployment of spatial attention.
Project description:Auditory selective attention plays an essential role for identifying sounds of interest in a scene, but the neural underpinnings are still incompletely understood. Recent findings demonstrate that neural activity that is time-locked to a particular amplitude-modulation (AM) is enhanced in the auditory cortex when the modulated stream of sounds is selectively attended to under sensory competition with other streams. However, the target sounds used in the previous studies differed not only in their AM, but also in other sound features, such as carrier frequency or location. Thus, it remains uncertain whether the observed enhancements reflect AM-selective attention. The present study aims at dissociating the effect of AM frequency on response enhancement in auditory cortex by using an ongoing auditory stimulus that contains two competing targets differing exclusively in their AM frequency. Electroencephalography results showed a sustained response enhancement for auditory attention compared to visual attention, but not for AM-selective attention (attended AM frequency vs. ignored AM frequency). In contrast, the response to the ignored AM frequency was enhanced, although a brief trend toward response enhancement occurred during the initial 15 s. Together with the previous findings, these observations indicate that selective enhancement of attended AMs in auditory cortex is adaptive under sustained AM-selective attention. This finding has implications for our understanding of cortical mechanisms for feature-based attentional gain control.
Project description:Two fundamental properties of perception are selective attention and perceptual contrast, but how these two processes interact remains unknown. Does an attended stimulus history exert a larger contrastive influence on the perception of a following target than unattended stimuli? Dutch listeners categorized target sounds with a reduced prefix "ge-" marking tense (e.g., ambiguous between gegaan-gaan "gone-go"). In 'single talker' Experiments 1-2, participants perceived the reduced syllable (reporting gegaan) when the target was heard after a fast sentence, but not after a slow sentence (reporting gaan). In 'selective attention' Experiments 3-5, participants listened to two simultaneous sentences from two different talkers, followed by the same target sounds, with instructions to attend only one of the two talkers. Critically, the speech rates of attended and unattended talkers were found to equally influence target perception - even when participants could watch the attended talker speak. In fact, participants' target perception in 'selective attention' Experiments 3-5 did not differ from participants who were explicitly instructed to divide their attention equally across the two talkers (Experiment 6). This suggests that contrast effects of speech rate are immune to selective attention, largely operating prior to attentional stream segregation in the auditory processing hierarchy.
Project description:How can we concentrate on relevant sounds in noisy environments? A "gain model" suggests that auditory attention simply amplifies relevant and suppresses irrelevant afferent inputs. However, it is unclear whether this suffices when attended and ignored features overlap to stimulate the same neuronal receptive fields. A "tuning model" suggests that, in addition to gain, attention modulates feature selectivity of auditory neurons. We recorded magnetoencephalography, EEG, and functional MRI (fMRI) while subjects attended to tones delivered to one ear and ignored opposite-ear inputs. The attended ear was switched every 30 s to quantify how quickly the effects evolve. To produce overlapping inputs, the tones were presented alone vs. during white-noise masking notch-filtered ±1/6 octaves around the tone center frequencies. Amplitude modulation (39 vs. 41 Hz in opposite ears) was applied for "frequency tagging" of attention effects on maskers. Noise masking reduced early (50-150 ms; N1) auditory responses to unattended tones. In support of the tuning model, selective attention canceled out this attenuating effect but did not modulate the gain of 50-150 ms activity to nonmasked tones or steady-state responses to the maskers themselves. These tuning effects originated at nonprimary auditory cortices, purportedly occupied by neurons that, without attention, have wider frequency tuning than ±1/6 octaves. The attentional tuning evolved rapidly, during the first few seconds after attention switching, and correlated with behavioral discrimination performance. In conclusion, a simple gain model alone cannot explain auditory selective attention. In nonprimary auditory cortices, attention-driven short-term plasticity retunes neurons to segregate relevant sounds from noise.
Project description:A visual scene is perceived in terms of visual objects. Similar ideas have been proposed for the analogous case of auditory scene analysis, although their hypothesized neural underpinnings have not yet been established. Here, we address this question by recording from subjects selectively listening to one of two competing speakers, either of different or the same sex, using magnetoencephalography. Individual neural representations are seen for the speech of the two speakers, with each being selectively phase locked to the rhythm of the corresponding speech stream and from which can be exclusively reconstructed the temporal envelope of that speech stream. The neural representation of the attended speech dominates responses (with latency near 100 ms) in posterior auditory cortex. Furthermore, when the intensity of the attended and background speakers is separately varied over an 8-dB range, the neural representation of the attended speech adapts only to the intensity of that speaker but not to the intensity of the background speaker, suggesting an object-level intensity gain control. In summary, these results indicate that concurrent auditory objects, even if spectrotemporally overlapping and not resolvable at the auditory periphery, are neurally encoded individually in auditory cortex and emerge as fundamental representational units for top-down attentional modulation and bottom-up neural adaptation.
Project description:People affected by severe neuro-degenerative diseases (e.g., late-stage amyotrophic lateral sclerosis (ALS) or locked-in syndrome) eventually lose all muscular control. Thus, they cannot use traditional assistive communication devices that depend on muscle control, or brain-computer interfaces (BCIs) that depend on the ability to control gaze. While auditory and tactile BCIs can provide communication to such individuals, their use typically entails an artificial mapping between the stimulus and the communication intent. This makes these BCIs difficult to learn and use. In this study, we investigated the use of selective auditory attention to natural speech as an avenue for BCI communication. In this approach, the user communicates by directing his/her attention to one of two simultaneously presented speakers. We used electrocorticographic (ECoG) signals in the gamma band (70-170 Hz) to infer the identity of attended speaker, thereby removing the need to learn such an artificial mapping. Our results from twelve human subjects show that a single cortical location over superior temporal gyrus or pre-motor cortex is typically sufficient to identify the attended speaker within 10 s and with 77% accuracy (50% accuracy due to chance). These results lay the groundwork for future studies that may determine the real-time performance of BCIs based on selective auditory attention to speech.
Project description:Human listeners can follow the voice of one speaker while several others are talking at the same time. This process requires segregating the speech streams from each other and continuously directing attention to the target stream. We investigated the functional brain networks underlying this ability. Two speech streams were presented simultaneously to participants, who followed one of them and detected targets within it (target stream). The loudness of the distractor speech stream varied on five levels: moderately softer, slightly softer, equal, slightly louder, or moderately louder than the attended. Performance measures showed that the most demanding task was the moderately softer distractors condition, which indicates that a softer distractor speech may receive more covert attention than louder distractors and, therefore, they require more cognitive resources. EEG-based measurement of functional connectivity between various brain regions revealed frequency-band specific networks: (1) energetic masking (comparing the louder distractor conditions with the equal loudness condition) was predominantly associated with stronger connectivity between the frontal and temporal regions at the lower alpha (8-10 Hz) and gamma (30-70 Hz) bands; (2) informational masking (comparing the softer distractor conditions with the equal loudness condition) was associated with a distributed network between parietal, frontal, and temporal regions at the theta (4-8 Hz) and beta (13-30 Hz) bands. These results suggest the presence of distinct cognitive and neural processes for solving the interference from energetic vs. informational masking.