Including Measures of High Gamma Power Can Improve the Decoding of Natural Speech From EEG.
ABSTRACT: The human auditory system is highly skilled at extracting and processing information from speech in both single-speaker and multi-speaker situations. A commonly studied speech feature is the amplitude envelope which can also be used to determine which speaker a listener is attending to in those multi-speaker situations. Non-invasive brain imaging (electro-/magnetoencephalography [EEG/MEG]) has shown that the phase of neural activity below 16 Hz tracks the dynamics of speech, whereas invasive brain imaging (electrocorticography [ECoG]) has shown that such processing is strongly reflected in the power of high frequency neural activity (around 70-150 Hz; known as high gamma). The first aim of this study was to determine if high gamma power scalp recorded EEG carries useful stimulus-related information, despite its reputation for having a poor signal to noise ratio. Specifically, linear regression was used to investigate speech envelope and attention decoding in low frequency EEG, high gamma power EEG, and in both EEG signals combined. The second aim was to assess whether the information reflected in high gamma power EEG may be complementary to that reflected in well-established low frequency EEG indices of speech processing. Exploratory analyses were also completed to examine how low frequency and high gamma power EEG may be sensitive to different features of the speech envelope. While low frequency speech tracking was evident for almost all subjects as expected, high gamma power also showed robust speech tracking in some subjects. This same pattern was true for attention decoding using a separate group of subjects who participated in a cocktail party attention experiment. For the subjects who showed speech tracking in high gamma power EEG, the spatiotemporal characteristics of that high gamma tracking differed from that of low-frequency EEG. Furthermore, combining the two neural measures led to improved measures of speech tracking for several subjects. Our results indicated that high gamma power EEG can carry useful information regarding speech processing and attentional selection in some subjects. Combining high gamma power and low frequency EEG can improve the mapping between natural speech and the resulting neural responses.
Project description:This study investigates the dynamics of speech envelope tracking during speech production, listening and self-listening. We use a paradigm in which participants listen to natural speech (Listening), produce natural speech (Speech Production), and listen to the playback of their own speech (Self-Listening), all while their neural activity is recorded with EEG. After time-locking EEG data collection and auditory recording and playback, we used a Gaussian copula mutual information measure to estimate the relationship between information content in the EEG and auditory signals. In the 2-10 Hz frequency range, we identified different latencies for maximal speech envelope tracking during speech production and speech perception. Maximal speech tracking takes place approximately 110 ms after auditory presentation during perception and 25 ms before vocalisation during speech production. These results describe a specific timeline for speech tracking in speakers and listeners in line with the idea of a speech chain and hence, delays in communication.
Project description:Numerous neuroimaging studies demonstrated that the auditory cortex tracks ongoing speech and that, in multi-speaker environments, tracking of the attended speaker is enhanced compared to the other irrelevant speakers. In contrast to speech, multi-instrument music can be appreciated by attending not only on its individual entities (i.e., segregation) but also on multiple instruments simultaneously (i.e., integration). We investigated the neural correlates of these two modes of music listening using electroencephalography (EEG) and sound envelope tracking. To this end, we presented uniquely composed music pieces played by two instruments, a bassoon and a cello, in combination with a previously validated music auditory scene analysis behavioral paradigm (Disbergen et al., 2018). Similar to results obtained through selective listening tasks for speech, relevant instruments could be reconstructed better than irrelevant ones during the segregation task. A delay-specific analysis showed higher reconstruction for the relevant instrument during a middle-latency window for both the bassoon and cello and during a late window for the bassoon. During the integration task, we did not observe significant attentional modulation when reconstructing the overall music envelope. Subsequent analyses indicated that this null result might be due to the heterogeneous strategies listeners employ during the integration task. Overall, our results suggest that subsequent to a common processing stage, top-down modulations consistently enhance the relevant instrument's representation during an instrument segregation task, whereas such an enhancement is not observed during an instrument integration task. These findings extend previous results from speech tracking to the tracking of multi-instrument music and, furthermore, inform current theories on polyphonic music perception.
Project description:Human brain performs remarkably well in segregating a particular speaker from interfering ones in a multispeaker scenario. We can quantitatively evaluate the segregation capability by modeling a relationship between the speech signals present in an auditory scene, and the listener's cortical signals measured using electroencephalography (EEG). This has opened up avenues to integrate neuro-feedback into hearing aids where the device can infer user's attention and enhance the attended speaker. Commonly used algorithms to infer the auditory attention are based on linear systems theory where cues such as speech envelopes are mapped on to the EEG signals. Here, we present a joint convolutional neural network (CNN)-long short-term memory (LSTM) model to infer the auditory attention. Our joint CNN-LSTM model takes the EEG signals and the spectrogram of the multiple speakers as inputs and classifies the attention to one of the speakers. We evaluated the reliability of our network using three different datasets comprising of 61 subjects, where each subject undertook a dual-speaker experiment. The three datasets analyzed corresponded to speech stimuli presented in three different languages namely German, Danish, and Dutch. Using the proposed joint CNN-LSTM model, we obtained a median decoding accuracy of 77.2% at a trial duration of 3 s. Furthermore, we evaluated the amount of sparsity that the model can tolerate by means of magnitude pruning and found a tolerance of up to 50% sparsity without substantial loss of decoding accuracy.
Project description:Schizophrenia is a complex disorder characterized by marked social dysfunctions, but the neural mechanism underlying this deficit is unknown. To investigate whether face-specific perceptual processes are influenced in schizophrenia patients, both face detection and configural analysis were assessed in normal individuals and schizophrenia patients by recording electroencephalogram (EEG) data. Here, a face processing model was built based on the frequency oscillations, and the evoked power (theta, alpha, and beta bands) and the induced power (gamma bands) were recorded while the subjects passively viewed face and nonface images presented in upright and inverted orientations. The healthy adults showed a significant face-specific effect in the alpha, beta, and gamma bands, and an inversion effect was observed in the gamma band in the occipital lobe and right temporal lobe. Importantly, the schizophrenia patients showed face-specific deficits in the low-frequency beta and gamma bands, and the face inversion effect in the gamma band was absent from the occipital lobe. All these results revealed face-specific processing in patients due to the disorder of high-frequency EEG, providing additional evidence to enrich future studies investigating neural mechanisms and serving as a marked diagnostic basis.
Project description:Difficulties in selectively attending to one among several speakers have mainly been associated with the distraction caused by ignored speech. Thus, in the current study, we investigated the neural processing of ignored speech in a two-competing-speaker paradigm. For this, we recorded the participant's brain activity using electroencephalography (EEG) to track the neural representation of the attended and ignored speech envelope. To provoke distraction, we occasionally embedded the participant's first name in the ignored speech stream. Retrospective reports as well as the presence of a P3 component in response to the name indicate that participants noticed the occurrence of their name. As predicted, the neural representation of the ignored speech envelope increased after the name was presented therein, suggesting that the name had attracted the participant's attention. Interestingly, in contrast to our hypothesis, the neural tracking of the attended speech envelope also increased after the name occurrence. On this account, we conclude that the name might not have primarily distracted the participants, at most for a brief duration, but that it alerted them to focus to their actual task. These observations remained robust even when the sound intensity of the ignored speech stream, and thus the sound intensity of the name, was attenuated.
Project description:Learning foreign speech contrasts involves creating new representations of sound categories in memory. This formation of new memory representations is likely to involve changes in neural networks as reflected by oscillatory brain activity. To explore this, we conducted time-frequency analyses of electro-encephalography (EEG) data recorded in a passive auditory oddball paradigm using Thai language tones. We compared native speakers of English (a non-tone language) and native speakers of Mandarin Chinese (a tone language), before and after a two-day laboratory training. Native English speakers showed a larger gamma-band power and stronger alpha-band synchrony across EEG channels than the native Chinese speakers, especially after training. This is compatible with the view that forming new speech categories on the basis of unfamiliar perceptual dimensions involves stronger gamma activity and more coherent activity in alpha-band networks than forming new categories on the basis of familiar dimensions.
Project description:In a multi-speaker scenario, the human auditory system is able to attend to one particular speaker of interest and ignore the others. It has been demonstrated that it is possible to use electroencephalography (EEG) signals to infer to which speaker someone is attending by relating the neural activity to the speech signals. However, classifying auditory attention within a short time interval remains the main challenge. We present a convolutional neural network-based approach to extract the locus of auditory attention (left/right) without knowledge of the speech envelopes. Our results show that it is possible to decode the locus of attention within 1-2 s, with a median accuracy of around 81%. These results are promising for neuro-steered noise suppression in hearing aids, in particular in scenarios where per-speaker envelopes are unavailable.
Project description:The human brain tracks amplitude fluctuations of both speech and music, which reflects acoustic processing in addition to the encoding of higher-order features and one's cognitive state. Comparing neural tracking of speech and music envelopes can elucidate stimulus-general mechanisms, but direct comparisons are confounded by differences in their envelope spectra. Here, we use a novel method of frequency-constrained reconstruction of stimulus envelopes using EEG recorded during passive listening. We expected to see music reconstruction match speech in a narrow range of frequencies, but instead we found that speech was reconstructed better than music for all frequencies we examined. Additionally, models trained on all stimulus types performed as well or better than the stimulus-specific models at higher modulation frequencies, suggesting a common neural mechanism for tracking speech and music. However, speech envelope tracking at low frequencies, below 1 Hz, was associated with increased weighting over parietal channels, which was not present for the other stimuli. Our results highlight the importance of low-frequency speech tracking and suggest an origin from speech-specific processing in the brain.
Project description:Heschl's gyrus (HG) is a brain area that includes the primary auditory cortex in humans. Due to the limitations in obtaining direct neural measurements from this region during naturalistic speech listening, the functional organization and the role of HG in speech perception remain uncertain. Here, we used intracranial EEG to directly record neural activity in HG in eight neurosurgical patients as they listened to continuous speech stories. We studied the spatial distribution of acoustic tuning and the organization of linguistic feature encoding. We found a main gradient of change from posteromedial to anterolateral parts of HG. We also observed a decrease in frequency and temporal modulation tuning and an increase in phonemic representation, speaker normalization, speech sensitivity, and response latency. We did not observe a difference between the two brain hemispheres. These findings reveal a functional role for HG in processing and transforming simple to complex acoustic features and inform neurophysiological models of speech processing in the human auditory cortex.
Project description:Neural processing along the ascending auditory pathway is often associated with a progressive reduction in characteristic processing rates. For instance, the well-known frequency-following response (FFR) of the auditory midbrain, as measured with electroencephalography (EEG), is dominated by frequencies from ?100?Hz to several hundred Hz, phase-locking to the acoustic stimulus at those frequencies. In contrast, cortical responses, whether measured by EEG or magnetoencephalography (MEG), are typically characterized by frequencies of a few Hz to a few tens of Hz, time-locking to acoustic envelope features. In this study we investigated a crossover case, cortically generated responses time-locked to continuous speech features at FFR-like rates. Using MEG, we analyzed responses in the high gamma range of 70-200?Hz to continuous speech using neural source-localized reverse correlation and the corresponding temporal response functions (TRFs). Continuous speech stimuli were presented to 40 subjects (17 younger, 23 older adults) with clinically normal hearing and their MEG responses were analyzed in the 70-200?Hz band. Consistent with the relative insensitivity of MEG to many subcortical structures, the spatiotemporal profile of these response components indicated a cortical origin with ?40?ms peak latency and a right hemisphere bias. TRF analysis was performed using two separate aspects of the speech stimuli: a) the 70-200?Hz carrier of the speech, and b) the 70-200?Hz temporal modulations in the spectral envelope of the speech stimulus. The response was dominantly driven by the envelope modulation, with a much weaker contribution from the carrier. Age-related differences were also analyzed to investigate a reversal previously seen along the ascending auditory pathway, whereby older listeners show weaker midbrain FFR responses than younger listeners, but, paradoxically, have stronger cortical low frequency responses. In contrast to both these earlier results, this study did not find clear age-related differences in high gamma cortical responses to continuous speech. Cortical responses at FFR-like frequencies shared some properties with midbrain responses at the same frequencies and with cortical responses at much lower frequencies.