Orangutans show active voicing through a membranophone.
ABSTRACT: Active voicing - voluntary control over vocal fold oscillation - is essential for speech. Nonhuman great apes can learn new consonant- and vowel-like calls, but active voicing by our closest relatives has historically been the hardest evidence to concede to. To resolve this controversy, a diagnostic test for active voicing is reached here through the use of a membranophone: a musical instrument where a player's voice flares a membrane's vibration through oscillating air pressure. We gave the opportunity to use a membranophone to six orangutans (with no effective training), three of whom produced a priori novel (species-atypical) individual-specific vocalizations. After 11 and 34?min, two subjects were successful by producing their novel vocalizations into the instrument, hence, confirming active voicing. Beyond expectation, however, within <1?hour, both subjects found opposite strategies to significantly alter their voice duration and frequency to better activate the membranophone, further demonstrating plastic voice control as a result of experience with the instrument. Results highlight how individual differences in vocal proficiency between great apes may affect performance in experimental tests. Failing to adjust a test's difficulty level to individuals' vocal skill may lead to false negatives, which may have largely been the case in past studies now used as "textbook fact" for great ape "missing" vocal capacities. Results qualitatively differ from small changes that can be caused in innate monkey calls by intensive months-long conditional training. Our findings verify that active voicing beyond the typical range of the species' repertoire, which in our species underpins the acquisition of new voiced speech sounds, is not uniquely human among great apes.
Project description:Many nonhuman primates produce food-associated vocalizations upon encountering or ingesting particular food. Concerning the great apes, only food-associated vocalizations of chimpanzees (Pan troglodytes) and bonobos (Pan paniscus) have been studied in detail, providing evidence that these vocalizations can be produced flexibly in relation to a variety of factors, such as the quantity and quality of food and/or the type of audience. Only anecdotal evidence exists of eastern (Gorilla beringei) and western gorillas (Gorilla gorilla) producing food-associated vocalizations, termed SINGING or HUMMING. To enable a better understanding of the context in which these calls are produced, we investigated and compared the vocal behavior of two free-ranging groups of western lowland gorillas (Gorilla g. gorilla) at Mondika, Republic of Congo. Our results show that (a) food-associated call production occurs only during feeding and not in other contexts; (b) calling is not uniformly distributed across age and sex classes; (c) calls are only produced during feeding on specific foods; and (d) normally just one individual gives calls during group feeding sessions, however, certain food types elicit simultaneous calling of two or more individuals. Our findings provide new insight into the vocal abilities of gorillas but also carry larger implications for questions concerning vocal variability among the great apes. Food-associated calls of nonhuman primates have been shown to be flexible in terms of when they are used and who they are directed at, making them interesting vocalizations from the viewpoint of language evolution. Food-associated vocalizations in great apes can offer new opportunities to investigate the phylogenetic development of vocal communication within the primate lineage and can possibly contribute novel insights into the origins of human language.
Project description:Fundamental frequency (F0, perceived as voice pitch) predicts sex and age, hormonal status, mating success and a range of social traits, and thus functions as an important biosocial marker in modal speech. Yet, the role of F0 in human nonverbal vocalizations remains unclear, and given considerable variability in F0 across call types, it is not known whether F0 cues to vocalizer attributes are shared across speech and nonverbal vocalizations. Here, using a corpus of vocal sounds from 51 men and women, we examined whether individual differences in F0 are retained across neutral speech, valenced speech and nonverbal vocalizations (screams, roars and pain cries). Acoustic analyses revealed substantial variability in F0 across vocal types, with mean F0 increasing as much as 10-fold in screams compared to speech in the same individual. Despite these extreme pitch differences, sexual dimorphism was preserved within call types and, critically, inter-individual differences in F0 correlated across vocal types (r = 0.36-0.80) with stronger relationships between vocal types of the same valence (e.g. 38% of the variance in roar F0 was predicted by aggressive speech F0). Our results indicate that biologically and socially relevant indexical cues in the human voice are preserved in simulated valenced speech and vocalizations, including vocalizations characterized by extreme F0 modulation, suggesting that voice pitch may function as a reliable individual and biosocial marker across disparate communication contexts.
Project description:The evolutionary origins of speech remain obscure. Recently, it was proposed that speech derived from monkey facial signals which exhibit a speech-like rhythm of ?5 open-close lip cycles per second. In monkeys, these signals may also be vocalized, offering a plausible evolutionary stepping stone towards speech. Three essential predictions remain, however, to be tested to assess this hypothesis' validity; (i) Great apes, our closest relatives, should likewise produce 5Hz-rhythm signals, (ii) speech-like rhythm should involve calls articulatorily similar to consonants and vowels given that speech rhythm is the direct product of stringing together these two basic elements, and (iii) speech-like rhythm should be experience-based. Via cinematic analyses we demonstrate that an ex-entertainment orangutan produces two calls at a speech-like rhythm, coined "clicks" and "faux-speech." Like voiceless consonants, clicks required no vocal fold action, but did involve independent manoeuvring over lips and tongue. In parallel to vowels, faux-speech showed harmonic and formant modulations, implying vocal fold and supralaryngeal action. This rhythm was several times faster than orangutan chewing rates, as observed in monkeys and humans. Critically, this rhythm was seven-fold faster, and contextually distinct, than any other known rhythmic calls described to date in the largest database of the orangutan repertoire ever assembled. The first two predictions advanced by this study are validated and, based on parsimony and exclusion of potential alternative explanations, initial support is given to the third prediction. Irrespectively of the putative origins of these calls and underlying mechanisms, our findings demonstrate irrevocably that great apes are not respiratorily, articulatorilly, or neurologically constrained for the production of consonant- and vowel-like calls at speech rhythm. Orangutan clicks and faux-speech confirm the importance of rhythmic speech antecedents within the primate lineage, and highlight potential articulatory homologies between great ape calls and human consonants and vowels.
Project description:Knowledge about vocal ontogeny and vocal plasticity during ontogeny in primate species is central to understanding the evolution of human speech. Vocalizations in gibbons (Hominoidea) are very interesting and contain complex species- and sex-specific patterns. However, ontogeny of gibbon songs is little studied. Here, we document regular production and ontogenetic changes of female-specific "great call" in 4 immature (2 juvenile-c.a. 3 years old; and 2 adolescent-c.a. 5 years old) males of southern yellow-cheeked gibbon (N. gabriellae) over nine months. None of the males produced fully developed adult-like "great call" and little ontogenetic changes to "great calls" occurred. "Great calls" of sons were shorter, started higher and ended lower than those of their mothers. Regular production of twitter part of great call likely appears around 4th year as it was observed in adolescent but not in juvenile males.
Project description:Previous research employing a real-time auditory perturbation paradigm has shown that talkers monitor their own speech attributes such as fundamental frequency, vowel intensity, vowel formants, and fricative noise as part of speech motor control. In the case of vowel formants or fricative noise, what was manipulated is spectral information about the filter function of the vocal tract. However, segments can be contrasted by parameters other than spectral configuration. It is possible that the feedback system monitors phonation timing in the way it does spectral information. This study examined whether talkers exhibit a compensatory behavior when manipulating information about voicing. When talkers received feedback of the cognate of the intended voicing category (saying "tipper" while hearing "dipper" or vice versa), they changed the voice onset time and in some cases the following vowel.
Project description:Despite widespread evidence that nonverbal components of human speech (e.g., voice pitch) communicate information about physical attributes of vocalizers and that listeners can judge traits such as strength and body size from speech, few studies have examined the communicative functions of human nonverbal vocalizations (such as roars, screams, grunts and laughs). Critically, no previous study has yet to examine the acoustic correlates of strength in nonverbal vocalisations, including roars, nor identified reliable vocal cues to strength in human speech. In addition to being less acoustically constrained than articulated speech, agonistic nonverbal vocalizations function primarily to express motivation and emotion, such as threat, and may therefore communicate strength and body size more effectively than speech. Here, we investigated acoustic cues to strength and size in roars compared to screams and speech sentences produced in both aggressive and distress contexts. Using playback experiments, we then tested whether listeners can reliably infer a vocalizer's actual strength and height from roars, screams, and valenced speech equivalents, and which acoustic features predicted listeners' judgments. While there were no consistent acoustic cues to strength in any vocal stimuli, listeners accurately judged inter-individual differences in strength, and did so most effectively from aggressive voice stimuli (roars and aggressive speech). In addition, listeners more accurately judged strength from roars than from aggressive speech. In contrast, listeners' judgments of height were most accurate for speech stimuli. These results support the prediction that vocalizers maximize impressions of physical strength in aggressive compared to distress contexts, and that inter-individual variation in strength may only be honestly communicated in vocalizations that function to communicate threat, particularly roars. Thus, in continuity with nonhuman mammals, the acoustic structure of human aggressive roars may have been selected to communicate, and to some extent exaggerate, functional cues to physical formidability.
Project description:The study of vocal communication in animal models provides key insight to the neurogenetic basis for speech and communication disorders. Current methods for vocal analysis suffer from a lack of standardization, creating ambiguity in cross-laboratory and cross-species comparisons. Here, we present VoICE (Vocal Inventory Clustering Engine), an approach to grouping vocal elements by creating a high dimensionality dataset through scoring spectral similarity between all vocalizations within a recording session. This dataset is then subjected to hierarchical clustering, generating a dendrogram that is pruned into meaningful vocalization "types" by an automated algorithm. When applied to birdsong, a key model for vocal learning, VoICE captures the known deterioration in acoustic properties that follows deafening, including altered sequencing. In a mammalian neurodevelopmental model, we uncover a reduced vocal repertoire of mice lacking the autism susceptibility gene, Cntnap2. VoICE will be useful to the scientific community as it can standardize vocalization analyses across species and laboratories.
Project description:During rodent active behavior, multiple orofacial sensorimotor behaviors, including sniffing and whisking, display rhythmicity in the theta range (~5-10 Hz). During specific behaviors, these rhythmic patterns interlock, such that execution of individual motor programs becomes dependent on the state of the others. Here we performed simultaneous recordings of the respiratory cycle and ultrasonic vocalization emission by adult rats and mice in social settings. We used automated analysis to examine the relationship between breathing patterns and vocalization over long time periods. Rat ultrasonic vocalizations (USVs, "50 kHz") were emitted within stretches of active sniffing (5-10 Hz) and were largely absent during periods of passive breathing (1-4 Hz). Because ultrasound was tightly linked to the exhalation phase, the sniffing cycle segmented vocal production into discrete calls and imposed its theta rhythmicity on their timing. In turn, calls briefly prolonged exhalations, causing an immediate drop in sniffing rate. Similar results were obtained in mice. Our results show that ultrasonic vocalizations are an integral part of the rhythmic orofacial behavioral ensemble. This complex behavioral program is thus involved not only in active sensing but also in the temporal structuring of social communication signals. Many other social signals of mammals, including monkey calls and human speech, show structure in the theta range. Our work points to a mechanism for such structuring in rodent ultrasonic vocalizations.
Project description:Songbirds are one of the few groups of animals that learn the sounds used for vocal communication during development. Like humans, songbirds memorize vocal sounds based on auditory experience with vocalizations of adult "tutors", and then use auditory feedback of self-produced vocalizations to gradually match their motor output to the memory of tutor sounds. In humans, investigations of early vocal learning have focused mainly on perceptual skills of infants, whereas studies of songbirds have focused on measures of vocal production. In order to fully exploit songbirds as a model for human speech, understand the neural basis of learned vocal behavior, and investigate links between vocal perception and production, studies of songbirds must examine both behavioral measures of perception and neural measures of discrimination during development. Here we used behavioral and electrophysiological assays of the ability of songbirds to distinguish vocal calls of varying frequencies at different stages of vocal learning. The results show that neural tuning in auditory cortex mirrors behavioral improvements in the ability to make perceptual distinctions of vocal calls as birds are engaged in vocal learning. Thus, separate measures of neural discrimination and behavioral perception yielded highly similar trends during the course of vocal development. The timing of this improvement in the ability to distinguish vocal sounds correlates with our previous work showing substantial refinement of axonal connectivity in cortico-basal ganglia pathways necessary for vocal learning.
Project description:The origin of human speech is still a hotly debated topic in science. Evidence of socially-guided acoustic flexibility and proto-conversational rules has been found in several monkey species, but is lacking in social and cooperative great apes. Here we investigated spontaneous vocal interactions within a peaceful context in captive bonobos to reveal that vocal interactions obey temporally and social rules. Dyadic vocal interactions were characterized by call overlap avoidance and short inter-call intervals. Bonobos preferentially responded to conspecifics with whom they maintained close bonds. We also found that vocal sharing rate (production rate of shared acoustic variants within each given dyad) was mostly explained by the age difference of callers, as other individual characteristics (sex, kinship) and social parameters (affinity in spatial proximity and in vocal interactions) were not. Our results show that great apes spontaneously display primitive conversation rules guided by social bonds. The demonstration that such coordinated vocal interactions are shared between monkeys, apes and humans fills a significant gap in our knowledge of vocal communication within the primate phylogeny and highlights the universal feature of social influence in vocal interactions.