A method for evaluating the relation between sound source segregation and masking.
Ontology highlight
ABSTRACT: Sound source segregation refers to the ability to hear as separate entities two or more sound sources comprising a mixture. Masking refers to the ability of one sound to make another sound difficult to hear. Often in studies, masking is assumed to result from a failure of segregation, but this assumption may not always be correct. Here a method is offered to identify the relation between masking and sound source segregation in studies and an example is given of its application.
Project description:Tinnitus therapies have been combined with the use of varieties of sound/noise. For masking external sounds, location of the masker in space is important; however, effects of the spatial location of the masker on tinnitus are less understood. We aimed to test whether a masking sound location would affect the perception level of simulated tinnitus. The 4 kHz simulated tinnitus was induced in the right ear of healthy volunteers through an open-type earphone. White noise was presented to the right ear using a single-sided headphone or a speaker positioned on the right side at a distance of 1.8 m for masking the simulated tinnitus. In other sessions, monaurally recorded noise localized within the head (inside-head noise) or binaurally recorded noise localized outside the head (outside-head noise) was separately presented from a dual-sided headphone. The noise presented from a distant speaker and the outside-head noise masked the simulated tinnitus in 71.1% and 77.1% of measurements at a lower intensity compared to the noise beside the ear and the inside-head noise, respectively. In conclusion, spatial information regarding the masking noise may play a role in reducing the perception level of simulated tinnitus. Binaurally recorded sounds may be beneficial for an acoustic therapy of tinnitus.
Project description:The perception of two simultaneous tones was investigated in goldfish using classical respiratory conditioning and a stimulus generalization paradigm. Pairs of tones were used to make up a mixture of 150 Hz and a higher harmonic or a mistuned harmonic. Fish were conditioned to the two-tone mixture and then tested for generalization to several pure tones. The simultaneous tones tended to be segregated in perception, with the generalization gradient for single tones having two peaks corresponding to the frequencies of the tone pairs. There were no consistent differences in the generalization gradients following conditioning to harmonic or inharmonic tone pairs. In addition, experiments were carried out in which the two tones of the pair were heard on alternate trials, always as single tones, followed by generalization tests to single tones. There was more generalization in this experiment, reflecting the fact that conditioning and generalization test stimuli were both single tones. However, the shapes of the generalization gradients were similar to those in which fish were conditioned to two simultaneous tones, indicating that the simultaneity of the tones did not make them harder to segregate. As the frequency separation between the two components narrowed, segregation tended to fail.
Project description:Nonword pronunciation is a critical challenge for models of reading aloud but little attention has been given to identifying the best method for assessing model predictions. The most typical approach involves comparing the model's pronunciations of nonwords to pronunciations of the same nonwords by human participants and deeming the model's output correct if it matches with any transcription of the human pronunciations. The present paper introduces a new ratings-based method, in which participants are shown printed nonwords and asked to rate the plausibility of the provided pronunciations, generated here by a speech synthesiser. We demonstrate this method with reference to a previously published database of 915 disyllabic nonwords (Mousikou et al., 2017). We evaluated two well-known psychological models, RC00 and CDP++, as well as an additional grapheme-to-phoneme algorithm known as Sequitur, and compared our model assessment with the corpus-based method adopted by Mousikou et al. We find that the ratings method: a) is much easier to implement than a corpus-based method, b) has a high hit rate and low false-alarm rate in assessing nonword reading accuracy, and c) provided a similar outcome as the corpus-based method in its assessment of RC00 and CDP++. However, the two methods differed in their evaluation of Sequitur, which performed much better under the ratings method. Indeed, our evaluation of Sequitur revealed that the corpus-based method introduced a number of false positives and more often, false negatives. Implications of these findings are discussed.
Project description:PurposeLearning to read is a complex, multifaceted process that relies on several speech and language-related subskills. Individual differences in word reading outcomes are indicated among children with inaccurate speech sound productions, with some of these children developing later reading difficulties. There are inconsistent reports as to whether phonological deficits and/or weaknesses in oral language explain these subsequent reading difficulties. Thus, it remains unclear how variability in speech production accuracy in early childhood may impact reading development. Therefore, the present longitudinal study seeks to clarify the relation between speech sound production accuracy in kindergarten and subsequent reading outcomes with a focus on additional potential mediating factors.MethodSpeech accuracy, core preliteracy skills (phonological awareness, rapid naming, and letter-name knowledge), and additional potential mediators (phonological memory and oral language abilities) were characterized at the start of formal reading instruction. Word reading, decoding, reading fluency, and comprehension were assessed at the end of second grade. Mediation analyses were conducted to examine factors that mediate the relation between speech accuracy in kindergarten and subsequent reading outcomes.ResultsInitial associations between early speech sound production accuracy and subsequent reading outcomes were indicated; however, mediation effects of preliteracy skills (phonological awareness and letter-name knowledge) were identified for word reading, decoding, and reading fluency outcomes. For reading comprehension, mediation effects of preliteracy and vocabulary skills were observed.ConclusionsThe relation between speech sound production accuracy and subsequent word reading, decoding, reading fluency, and comprehension was observed to be mediated by preliteracy skills, specifically phonological awareness and letter-name knowledge. For reading comprehension only, vocabulary knowledge were of additional importance.Supplemental materialhttps://doi.org/10.23641/asha.23671491.
Project description:We investigated potential cues to sound segregation by cochlear implant (CI) and normal-hearing (NH) listeners. In each presentation interval of experiment 1a, CI listeners heard a mixture of four pulse trains applied concurrently to separate electrodes, preceded by a "probe" applied to a single electrode. In one of these two intervals, which the subject had to identify, the probe electrode was the same as a "target" electrode in the mixture. The pulse train on the target electrode had a higher level than the others in the mixture. Additionally, it could be presented either with a 200-ms onset delay, at a lower rate, or with an asynchrony produced by delaying each pulse by about 5 ms re those on the nontarget electrodes. Neither the rate difference nor the asynchrony aided performance over and above the level difference alone, but the onset delay produced a modest improvement. Experiment 1b showed that two subjects could perform the task using the onset delay alone, with no level difference. Experiment 2 used a method similar to that of experiment 1, but investigated the onset cue using NH listeners. In one condition, the mixture consisted of harmonics 5 to 40 of a 100-Hz fundamental, with the onset of either harmonics 13 to 17 or 26 to 30 delayed re the rest. Performance was modest in this condition, but could be improved markedly by using stimuli containing a spectral gap between the target and nontarget harmonics. The results suggest that (a) CI users are unlikely to use temporal pitch differences between adjacent channels to separate concurrent sounds, and that (b) they can use onset differences between channels, but the usefulness of this cue will be compromised by the spread of excitation along the nerve-fiber array. This deleterious effect of spread-of-excitation can also impair the use of onset cues by NH listeners.
Project description:Impact sounds were synthesized according to standard textbook equations given for the motion of simply supported, metal plates. In a two-interval, forced-choice procedure, highly practiced listeners identified from these sounds a predefined class of target plates based on their particular material and geometric properties. The effects of two factors on identification were examined: the relative level of partials comprising the sounds and the relative amount of information (given as the difference in d(')) each partial provided for identification. In different conditions one factor was fixed while the other either increased or decreased with frequency. The effect on listener identification in each case was determined from a logistic discriminant analysis of trial-by-trial responses, yielding a vector of listener decision weights on the frequency and decay of individual partials. The weights increased proportionally with relative level, but were largely uninfluenced by relative information content--a result exactly opposite to that expected from a maximum-likelihood observer. The dominant effect of relative level was replicated for other sound sources (clamped bars and stretched membranes) and was not diminished by randomizing the relative level of partials across trials. The results are taken to underscore the importance of relative level in the identification of rudimentary sound sources.
Project description:The segregation of sound sources from the mixture of sounds that enters the ear is a core capacity of human hearing, but the extent to which this process is dependent on attention remains unclear. This study investigated the effect of attention on the ability to segregate sounds via repetition. We utilized a dual task design in which stimuli to be segregated were presented along with stimuli for a "decoy" task that required continuous monitoring. The task to assess segregation presented a target sound 10 times in a row, each time concurrent with a different distractor sound. McDermott, Wrobleski, and Oxenham (2011) demonstrated that repetition causes the target sound to be segregated from the distractors. Segregation was queried by asking listeners whether a subsequent probe sound was identical to the target. A control task presented similar stimuli but probed discrimination without engaging segregation processes. We present results from 3 different decoy tasks: a visual multiple object tracking task, a rapid serial visual presentation (RSVP) digit encoding task, and a demanding auditory monitoring task. Load was manipulated by using high- and low-demand versions of each decoy task. The data provide converging evidence of a small effect of attention that is nonspecific, in that it affected the segregation and control tasks to a similar extent. In all cases, segregation performance remained high despite the presence of a concurrent, objectively demanding decoy task. The results suggest that repetition-based segregation is robust to inattention.
Project description:An increasing level of anthropogenic underwater noise (shipping, drilling, sonar use, etc.) impairs acoustic orientation and communication in fish by hindering signal transmission or detection. Different noise regimes can reduce the ability to detect sounds of conspecifics due to an upward shift of the hearing threshold, a phenomenon termed masking. We therefore investigated the masking effect of white noise on the auditory thresholds in female croaking gouramis (Trichopsis vittata, Osphronemidae). We hypothesized that noise would influence the detection of conspecific vocalizations and thus acoustic communication. The auditory evoked potentials (AEP) thresholds were measured at six different frequencies between 0.1 and 4 kHz using the AEP recording technique. Sound pressure level audiograms were determined under quiet laboratory conditions (no noise) and continuous white noise of 110 dB RMS. Thresholds increased in the presence of white noise at all tested frequencies by 12-18 dB, in particular at 1.5 kHz. Moreover, hearing curves were compared to spectra of conspecific sounds to assess sound detection in the presence of noise in various contexts. We showed that masking hinders the detection of conspecific sounds, which have main energies between 1.0 and 1.5 kHz. We predict that this will particularly affect hearing of female's low-intensity purring sounds during mating. Accordingly, noise will negatively affect acoustic communication and most likely reproductive success.
Project description:Introduction: There has been a surge in the use of social robots for providing information, persuasion, and entertainment in noisy public spaces in recent years. Considering the well-documented negative effect of noise on human cognition, masking sounds have been introduced. Masking sounds work, in principle, by making the intrusive background speeches less intelligible, and hence, less distracting. However, this reduced distraction comes with the cost of increasing annoyance and reduced cognitive performance in the users of masking sounds. Methods: In a previous study, it was shown that reducing the fundamental frequency of the speech-shaped noise as a masking sound significantly contributes to its being less annoying and more efficient. In this study, the effectiveness of the proposed masking sound was tested on the performance of subjects listening to a lecture given by a social robot in a noisy cocktail party environment. Results: The results indicate that the presence of the masking sound significantly increased speech comprehension, perceived understandability, acoustic satisfaction, and sound privacy of the individuals listening to the robot in an adverse listening condition. Discussion: To the knowledge of the authors, no previous work has investigated the application of sound masking technology in human-robot interaction designs. The future directions of this trend are discussed.
Project description:ObjectivesTo investigate factors influencing the effectiveness of intensive sound masking therapy on tinnitus using logistic regression analysis.DesignThe study used a retrospective cross-section analysis.Participants102 patients with tinnitus were recruited at the Sun Yat-sen Memorial Hospital of Sun Yat-sen University, China.InterventionIntensive sound masking therapy was used as an intervention approach for patients with tinnitus.Primary and secondary outcome measuresParticipants underwent audiological investigations and tinnitus pitch and loudness matching measurements, followed by intensive sound masking therapy. The Tinnitus Handicap Inventory (THI) was used as the outcome measure pre and post treatment. Multivariate logistic regression was performed to investigate the association of demographic and audiological factors with effective therapy.ResultsAccording to the THI score changes pre and post sound masking intervention, 51 participants were categorised into an effective group, the remaining 51 participants were placed in a non-effective group. Those in the effective group were significantly younger than those in the non-effective group (P=0.012). Significantly more participants had flat audiogram configurations in the effective group (P=0.04). Multivariable logistic regression analysis showed that age (OR=0.96, 95% CI 0.93 to 0.99, P=0.007), audiometric configuration (P=0.027) and THI score pre treatment (OR=1.04, 95% CI 1.02 to 1.07, P<0.001) were significantly associated with therapeutic effectiveness. Further analysis showed that patients with flat audiometric configurations were 5.45 times more likely to respond to intervention than those with high-frequency steeply sloping audiograms (OR=5.45, 95% CI 1.67 to 17.86, P=0.005).ConclusionAudiometric configuration, age and THI scores appear to be predictive of the effectiveness of sound masking treatment. Gender, tinnitus characteristics and hearing threshold measures do not seem to be related to treatment effectiveness. A further randomised control study is needed to provide evidence of the effectiveness of prognostic factors in tinnitus interventions.