Perception of Water-Based Masking Sounds-Long-Term Experiment in an Open-Plan Office.
ABSTRACT: A certain level of masking sound is necessary to control the disturbance caused by speech sounds in open-plan offices. The sound is usually provided with evenly distributed loudspeakers. Pseudo-random noise is often used as a source of artificial sound masking (PRMS). A recent laboratory experiment suggested that water-based masking sound (WBMS) could be more favorable than PRMS. The purpose of our study was to determine how the employees perceived different WBMSs compared to PRMS. The experiment was conducted in an open-plan office of 77 employees who had been accustomed to work under PRMS (44 dB LAeq). The experiment consisted of five masking conditions: the original PRMS, four different WBMSs and return to the original PRMS. The exposure time of each condition was 3 weeks. The noise level was nearly equal between the conditions (43-45 dB LAeq) but the spectra and the nature of the sounds were very different. A questionnaire was completed at the end of each condition. Acoustic satisfaction was worse during the WBMSs than during the PRMS. The disturbance caused by three out of four WBMSs was larger than that of PRMS. Several attributes describing the sound quality itself were in favor of PRMS. Colleagues' speech sounds disturbed more during WBMSs. None of the WBMSs produced better subjective ratings than PRMS. Although the first WBMS was equal with the PRMS for several variables, the overall results cannot be seen to support the use of WBMSs in office workplaces. Because the experiment suffered from some methodological weaknesses, conclusions about the adequacy of WBMSs cannot yet be drawn.
Project description:The equalization cancellation model is often used to predict the binaural masking level difference. Previously its application to speech in noise has required separate knowledge about the speech and noise signals to maximize the signal-to-noise ratio (SNR). Here, a novel, <i>blind</i> equalization cancellation model is introduced that can use the mixed signals. This approach does not require any assumptions about particular sound source directions. It uses different strategies for positive and negative SNRs, with the switching between the two steered by a blind decision stage utilizing modulation cues. The output of the model is a single-channel signal with enhanced SNR, which we analyzed using the speech intelligibility index to compare speech intelligibility predictions. In a first experiment, the model was tested on experimental data obtained in a scenario with spatially separated target and masker signals. Predicted speech recognition thresholds were in good agreement with measured speech recognition thresholds with a root mean square error less than 1?dB. A second experiment investigated signals at positive SNRs, which was achieved using time compressed and low-pass filtered speech. The results demonstrated that binaural unmasking of speech occurs at positive SNRs and that the modulation-based switching strategy can predict the experimental results.
Project description:The human auditory system has the ability to segregate complex auditory scenes into a foreground component and a background, allowing us to listen to specific speech sounds from a mixture of sounds. Selective attention plays a crucial role in this process, colloquially known as the "cocktail party effect." It has not been possible to build a machine that can emulate this human ability in real-time. Here, we have developed a framework for the implementation of a neuromorphic sound segregation algorithm in a Field Programmable Gate Array (FPGA). This algorithm is based on the principles of temporal coherence and uses an attention signal to separate a target sound stream from background noise. Temporal coherence implies that auditory features belonging to the same sound source are coherently modulated and evoke highly correlated neural response patterns. The basis for this form of sound segregation is that responses from pairs of channels that are strongly positively correlated belong to the same stream, while channels that are uncorrelated or anti-correlated belong to different streams. In our framework, we have used a neuromorphic cochlea as a frontend sound analyser to extract spatial information of the sound input, which then passes through band pass filters that extract the sound envelope at various modulation rates. Further stages include feature extraction and mask generation, which is finally used to reconstruct the targeted sound. Using sample tonal and speech mixtures, we show that our FPGA architecture is able to segregate sound sources in real-time. The accuracy of segregation is indicated by the high signal-to-noise ratio (SNR) of the segregated stream (90, 77, and 55 dB for simple tone, complex tone, and speech, respectively) as compared to the SNR of the mixture waveform (0 dB). This system may be easily extended for the segregation of complex speech signals, and may thus find various applications in electronic devices such as for sound segregation and speech recognition.
Project description:Synthetic speech has been widely used in the study of speech cues. A serious disadvantage of this method is that it requires prior knowledge about the cues to be identified in order to synthesize the speech. Incomplete or inaccurate hypotheses about the cues often lead to speech sounds of low quality. In this research a psychoacoustic method, named three-dimensional deep search (3DDS), is developed to explore the perceptual cues of stop consonants from naturally produced speech. For a given sound, it measures the contribution of each subcomponent to perception by time truncating, highpass/lowpass filtering, or masking the speech with white noise. The AI-gram, a visualization tool that simulates the auditory peripheral processing, is used to predict the audible components of the speech sound. The results are generally in agreement with the classical studies that stops are characterized by a short duration burst followed by a F2 transition, suggesting the effectiveness of the 3DDS method. However, it is also shown that /ba/ and /pa/ may have a wide band click as the dominant cue. F2 transition is not necessary for the perception of /ta/ and /ka/. Moreover, many stop consonants contain conflicting cues that are characteristic of competing sounds. The robustness of a consonant sound to noise is determined by the intensity of the dominant cue.
Project description:Binaural hearing helps normal-hearing listeners localize sound sources and understand speech in noise. However, it is not fully understood how far this is the case for bilateral cochlear implant (CI) users. To determine the potential benefits of bilateral over unilateral CIs, speech comprehension thresholds (SCTs) were measured in seven Japanese bilateral CI recipients using Helen test sentences (translated into Japanese) in a two-talker speech interferer presented from the front (co-located with the target speech), ipsilateral to the first-implanted ear (at +90° or -90°), and spatially symmetric at ±90°. Spatial release from masking was calculated as the difference between co-located and spatially separated SCTs. Localization was assessed in the horizontal plane by presenting either male or female speech or both simultaneously. All measurements were performed bilaterally and unilaterally (with the first implanted ear) inside a loudspeaker array. Both SCTs and spatial release from masking were improved with bilateral CIs, demonstrating mean bilateral benefits of 7.5 dB in spatially asymmetric and 3 dB in spatially symmetric speech mixture. Localization performance varied strongly between subjects but was clearly improved with bilateral over unilateral CIs with the mean localization error reduced by 27°. Surprisingly, adding a second talker had only a negligible effect on localization.
Project description:Some of the most common interfering background sounds a listener experiences are the sounds of other talkers. In Experiment 1, recognition for natural Institute of Electrical and Electronics Engineers (IEEE) sentences was measured in normal-hearing adults at two fixed signal-to-noise ratios (SNRs) in 16 backgrounds with the same long-term spectrum: unprocessed speech babble (1, 2, 4, 8, and 16 talkers), noise-vocoded versions of the babbles (12 channels), noise modulated with the wide-band envelope of the speech babbles, and unmodulated noise. All talkers were adult males. For a given number of talkers, natural speech was always the most effective masker. The greatest changes in performance occurred as the number of talkers in the maskers increased from 1 to 2 or 4, with small changes thereafter. In Experiment 2, the same targets and maskers (1, 2, and 16 talkers) were used to measure speech reception thresholds (SRTs) adaptively. Periodicity in the target was also manipulated by noise-vocoding, which led to considerably higher SRTs. The greatest masking effect always occurred for the masker type most similar to the target, while the effects of the number of talkers were generally small. Implications are drawn with reference to glimpsing, informational vs energetic masking, overall SNR, and aspects of periodicity.
Project description:Speech-on-speech recognition differs substantially across stimuli, but it is unclear what role linguistic features of the masker play in this variability. The linguistic similarity hypothesis suggests similarity between sentence-level semantic content of the target and masker speech increases masking. Sentence recognition in a two-talker masker was evaluated with respect to semantic content and syntactic structure of the masker (experiment 1) and linguistic similarity of the target and masker (experiment 2). Target and masker sentences were semantically meaningful or anomalous. Masker syntax was varied or the same across sentences. When other linguistic features of the masker were controlled, variability in syntactic structure across masker tokens was only relevant when the masker was played continuously (as opposed to gated); when played continuously, sentence-recognition thresholds were poorer with variable than consistent masker syntax, but this effect was small (0.5?dB). When the syntactic structure of the masker was held constant, semantic meaningfulness of the masker did not increase masking, and at times performance was better for the meaningful than the anomalous masker. These data indicate that sentence-level semantic content of the masker speech does not influence speech-on-speech masking. Further, no evidence that similarities between target/masker sentence-level semantic content increases masking was found.
Project description:Classic demonstrations of the phonemic restoration effect show increased intelligibility of interrupted speech when the interruptions are caused by a plausible masking sound rather than by silent periods. Previous studies of this effect have been conducted exclusively under anechoic or nearly anechoic listening conditions. This study demonstrates that the effect is reversed when sounds are presented in a realistically simulated reverberant room (broadband T(60) = 1.1 s): intelligibility is greater for silent interruptions than for interruptions by unmodulated noise. Additional results suggest that the reversal is primarily due to filling silent intervals with reverberant energy from the speech signal.
Project description:Background: The intrauterine hearing experience differs from the extrauterine hearing exposure within a neonatal intensive care unit (NICU) setting. Also, the listening experience of a neonate drastically differs from that of an adult. Several studies have documented that the sound level within a NICU exceeds the recommended threshold by far, possibly related to hearing loss thereafter. The aim of this study was, first, to precisely define the dynamics of sounds within an incubator and, second, to give clinicians and caregivers an idea about what can be heard "inside the box." Methods: Audio recordings within an incubator were conducted at the Pediatric Simulation Center of the Medical University Vienna. They contained recorded music, speech, and synthesized sounds. To understand the dynamics of sounds around and within the incubator, the following stimuli were used: broadband noise with decreasing sound level in 10 steps of 6 dB, sine waves (62.5, 125, 250, 500, 1000, 2000, 4000, 8000, and 16,000 Hz), logarithmic sweep (Chirp) over the frequency band 20 Hz to 21 kHz, singing male voice, singing, and whispering female voice. Results: Our results confirm a protective effect of the incubator from noises above 500 Hz in conditions of "no-flow" and show almost no protective effect of an incubator cover. We, furthermore, observed a strong boost of low frequencies below 125 Hz within the incubator, as well as a notable increase of higher frequency noises with open access doors, a significant resonant effect of the incubator, and a considerable masking effect of the respiratory support against any other source of noise or sound stimulation even for "low-flow" conditions. Conclusion: Our study reveals high noise levels of air supply at high flow rates and the boost of low frequencies within the incubator. Education of medical staff and family members as well as modifications of the physical environment should aim at reducing noise exposure of preterm infants in the incubator. Audiovisual material is provided as Supplementary Material.
Project description:A detrimental perceptive consequence of damaged auditory sensory hair cells consists in a pronounced masking effect exerted by low-frequency sounds, thought to occur when auditory threshold elevation substantially exceeds 40 dB. Here, we identified the submembrane scaffold protein Nherf1 as a hair-bundle component of the differentiating outer hair cells (OHCs). Nherf1(-/-) mice displayed OHC hair-bundle shape anomalies in the mid and basal cochlea, normally tuned to mid- and high-frequency tones, and mild (22-35 dB) hearing-threshold elevations restricted to midhigh sound frequencies. This mild decrease in hearing sensitivity was, however, discordant with almost nonresponding OHCs at the cochlear base as assessed by distortion-product otoacoustic emissions and cochlear microphonic potentials. Moreover, unlike wild-type mice, responses of Nherf1(-/-) mice to high-frequency (20-40 kHz) test tones were not masked by tones of neighboring frequencies. Instead, efficient maskers were characterized by their frequencies up to two octaves below the probe-tone frequency, unusually low intensities up to 25 dB below probe-tone level, and growth-of-masker slope (2.2 dB/dB) reflecting their compressive amplification. Together, these properties do not fit the current acknowledged features of a hypersensitivity of the basal cochlea to lower frequencies, but rather suggest a previously unidentified mechanism. Low-frequency maskers, we propose, may interact within the unaffected cochlear apical region with midhigh frequency sounds propagated there via a mode possibly using the persistent contact of misshaped OHC hair bundles with the tectorial membrane. Our findings thus reveal a source of misleading interpretations of hearing thresholds and of hypervulnerability to low-frequency sound interference.
Project description:Functional near-infrared spectroscopy (fNIRS) is a non-invasive brain imaging technique that measures changes in oxygenated and de-oxygenated hemoglobin concentration and can provide a measure of brain activity. In addition to neural activity, fNIRS signals contain components that can be used to extract physiological information such as cardiac measures. Previous studies have shown changes in cardiac activity in response to different sounds. This study investigated whether cardiac responses collected using fNIRS differ for different loudness of sounds. fNIRS data were collected from 28 normal hearing participants. Cardiac response measures evoked by broadband, amplitude-modulated sounds were extracted for four sound intensities ranging from near-threshold to comfortably loud levels (15, 40, 65 and 90 dB Sound Pressure Level (SPL)). Following onset of the noise stimulus, heart rate initially decreased for sounds of 15 and 40 dB SPL, reaching a significantly lower rate at 15 dB SPL. For sounds at 65 and 90 dB SPL, increases in heart rate were seen. To quantify the timing of significant changes, inter-beat intervals were assessed. For sounds at 40 dB SPL, an immediate significant change in the first two inter-beat intervals following sound onset was found. At other levels, the most significant change appeared later (beats 3 to 5 following sound onset). In conclusion, changes in heart rate were associated with the level of sound with a clear difference in response to near-threshold sounds compared to comfortably loud sounds. These findings may be used alone or in conjunction with other measures such as fNIRS brain activity for evaluation of hearing ability.