Bottom-up processing of curvilinear visual features is sufficient for animate/inanimate object categorization.
ABSTRACT: Animate and inanimate objects differ in their intermediate visual features. For instance, animate objects tend to be more curvilinear compared to inanimate objects (e.g., Levin, Takarae, Miner, & Keil, 2001). Recently, it has been demonstrated that these differences in the intermediate visual features of animate and inanimate objects are sufficient for categorization: Human participants viewing synthesized images of animate and inanimate objects that differ largely in the amount of these visual features classify objects as animate/inanimate significantly above chance (Long, Stormer, & Alvarez, 2017). A remaining question, however, is whether the observed categorization is a consequence of top-down cognitive strategies (e.g., rectangular shapes are less likely to be animals) or a consequence of bottom-up processing of their intermediate visual features, per se, in the absence of top-down cognitive strategies. To address this issue, we repeated the classification experiment of Long et al. (2017) but, unlike Long et al. (2017), matched the synthesized images, on average, in the amount of image-based and perceived curvilinear and rectilinear information. Additionally, in our synthesized images, global shape information was not preserved, and the images appeared as texture patterns. These changes prevented participants from using top-down cognitive strategies to perform the task. During the experiment, participants were presented with these synthesized, texture-like animate and inanimate images and, on each trial, were required to classify them as either animate or inanimate with no feedback given. Participants were told that these synthesized images depicted abstract art patterns. We found that participants still classified the synthesized stimuli significantly above chance even though they were unaware of their classification performance. For both object categories, participants depended more on the curvilinear and less on the rectilinear, image-based information present in the stimuli for classification. Surprisingly, the stimuli most consistently classified as animate were the most dangerous animals in our sample of images. We conclude that bottom-up processing of intermediate features present in the visual input is sufficient for animate/inanimate object categorization and that these features may convey information associated with the affective content of the visual stimuli.
Project description:Inferior temporal (IT) object representations have been intensively studied in monkeys and humans, but representations of the same particular objects have never been compared between the species. Moreover, IT's role in categorization is not well understood. Here, we presented monkeys and humans with the same images of real-world objects and measured the IT response pattern elicited by each image. In order to relate the representations between the species and to computational models, we compare response-pattern dissimilarity matrices. IT response patterns form category clusters, which match between man and monkey. The clusters correspond to animate and inanimate objects; within the animate objects, faces and bodies form subclusters. Within each category, IT distinguishes individual exemplars, and the within-category exemplar similarities also match between the species. Our findings suggest that primate IT across species may host a common code, which combines a categorical and a continuous representation of objects.
Project description:Face perception is a vital part of human social interactions. The social value of faces makes their efficient detection evolutionarily advantageous. It has been suggested that this might occur nonconsciously, but experimental results are equivocal thus far. Here, we probe nonconscious face perception using a novel combination of binocular rivalry with continuous flash suppression and steady-state visually evoked potentials. In the first two experiments, participants viewed either non-face objects, neutral faces (Study 1), or fearful faces (Study 2). Consistent with the hypothesis that faces are processed nonconsciously, we found that faces broke through suppression faster than objects. We did not, however, observe a concomitant face-selective steady-state visually evoked potential. Study 3 was run to reconcile this paradox. We hypothesized that the faster breakthrough time was due to a mid-level visual feature, curvilinearity, rather than high-level category membership, which would explain the behavioral difference without neural evidence of face-selective processing. We tested this hypothesis by presenting participants with four different groups of stimuli outside of conscious awareness: rectilinear objects (e.g., chessboard), curvilinear objects (e.g., dartboard), faces, and objects that were not dominantly curvilinear or rectilinear. We found that faces and curvilinear objects broke through suppression faster than objects and rectilinear objects. Moreover, there was no difference between faces and curvilinear objects. These results support our hypothesis that the observed behavioral advantage for faces is due to their curvilinearity, rather than category membership.
Project description:Inferior temporal (IT) cortex in human and nonhuman primates serves visual object recognition. Computational object-vision models, although continually improving, do not yet reach human performance. It is unclear to what extent the internal representations of computational models can explain the IT representation. Here we investigate a wide range of computational model representations (37 in total), testing their categorization performance and their ability to account for the IT representational geometry. The models include well-known neuroscientific object-recognition models (e.g. HMAX, VisNet) along with several models from computer vision (e.g. SIFT, GIST, self-similarity features, and a deep convolutional neural network). We compared the representational dissimilarity matrices (RDMs) of the model representations with the RDMs obtained from human IT (measured with fMRI) and monkey IT (measured with cell recording) for the same set of stimuli (not used in training the models). Better performing models were more similar to IT in that they showed greater clustering of representational patterns by category. In addition, better performing models also more strongly resembled IT in terms of their within-category representational dissimilarities. Representational geometries were significantly correlated between IT and many of the models. However, the categorical clustering observed in IT was largely unexplained by the unsupervised models. The deep convolutional network, which was trained by supervision with over a million category-labeled images, reached the highest categorization performance and also best explained IT, although it did not fully explain the IT data. Combining the features of this model with appropriate weights and adding linear combinations that maximize the margin between animate and inanimate objects and between faces and other objects yielded a representation that fully explained our IT data. Overall, our results suggest that explaining IT requires computational features trained through supervised learning to emphasize the behaviorally important categorical divisions prominently reflected in IT.
Project description:Simple geometric shapes moving in a self-propelled manner, and violating Newtonian laws of motion by acting against gravitational forces tend to induce a judgement that an object is animate. Objects that change their motion only due to external causes are more likely judged as inanimate. How the developing brain is employed in the perception of animacy in early ontogeny is currently unknown. The aim of this study was to use ERP techniques to determine if the negative central component (Nc), a waveform related to attention allocation, was differentially affected when an infant observed animate or inanimate motion. Short animated movies comprising a marble moving along a marble run either in an animate or an inanimate manner were presented to 15 infants who were 9 months of age. The ERPs were time-locked to a still frame representing animate or inanimate motion that was displayed following each movie. We found that 9-month-olds are able to discriminate between animate and inanimate motion based on motion cues alone and most likely allocate more attentional resources to the inanimate motion. The present data contribute to our understanding of the animate-inanimate distinction and the Nc as a correlate of infant cognitive processing.
Project description:Nature is composed of self-propelled, animate agents and inanimate objects. Laboratory studies have shown that human infants and a few species discriminate between animate and inanimate objects. This ability is assumed to have evolved to support social cognition and filial imprinting, but its ecological role for wild animals has never been examined. An alternative, functional explanation is that discriminating stimuli based on their potential for animacy helps animals distinguish between harmless and threatening stimuli. Using remote-controlled experimental stimulus presentations, we tested if wild jackdaws (Corvus monedula) respond fearfully to stimuli that violate expectations for movement. Breeding pairs (N = 27) were presented at their nests with moving and non-moving models of ecologically relevant stimuli (birds, snakes and sticks) that differed in threat level and propensity for independent motion. Jackdaws were startled by movement regardless of stimulus type and produced more alarm calls when faced with animate objects. However, they delayed longest in entering their nest-box after encountering a stimulus that should not move independently, suggesting they recognized the movement as unexpected. How jackdaws develop expectations about object movement is not clear, but our results suggest that discriminating between animate and inanimate stimuli may trigger information gathering about potential threats.
Project description:On average, we urban dwellers spend about 90% of our time indoors, and share the intuition that the physical features of the places we live and work in influence how we feel and act. However, there is surprisingly little research on how architecture impacts behavior, much less on how it influences brain function. To begin closing this gap, we conducted a functional magnetic resonance imaging study to examine how systematic variation in contour impacts aesthetic judgments and approach-avoidance decisions, outcome measures of interest to both architects and users of spaces alike. As predicted, participants were more likely to judge spaces as beautiful if they were curvilinear than rectilinear. Neuroanatomically, when contemplating beauty, curvilinear contour activated the anterior cingulate cortex exclusively, a region strongly responsive to the reward properties and emotional salience of objects. Complementing this finding, pleasantness--the valence dimension of the affect circumplex--accounted for nearly 60% of the variance in beauty ratings. Furthermore, activation in a distributed brain network known to underlie the aesthetic evaluation of different types of visual stimuli covaried with beauty ratings. In contrast, contour did not affect approach-avoidance decisions, although curvilinear spaces activated the visual cortex. The results suggest that the well-established effect of contour on aesthetic preference can be extended to architecture. Furthermore, the combination of our behavioral and neural evidence underscores the role of emotion in our preference for curvilinear objects in this domain.
Project description:Humans have a tendency to perceive inanimate objects as animate based on simple motion cues. Although animacy is considered as a complex cognitive property, this recognition seems to be spontaneous. Researchers have found that young human infants discriminate between dependent and independent movement patterns. However, quick visual perception of animate entities may be crucial to non-human species as well. Based on general mammalian homology, dogs may possess similar skills to humans. Here, we investigated whether dogs and humans discriminate similarly between dependent and independent motion patterns performed by geometric shapes. We projected a side-by-side video display of the two patterns and measured looking times towards each side, in two trials. We found that in Trial 1, both dogs and humans were equally interested in the two patterns, but in Trial 2 of both species, looking times towards the dependent pattern decreased, whereas they increased towards the independent pattern. We argue that dogs and humans spontaneously recognized the specific pattern and habituated to it rapidly, but continued to show interest in the 'puzzling' pattern. This suggests that both species tend to recognize inanimate agents as animate relying solely on their motions.
Project description:Primates are highly attuned not just to social characteristics of individual agents, but also to social interactions between multiple agents. Here we report a neural correlate of the representation of social interactions in the human brain. Specifically, we observe a strong univariate response in the posterior superior temporal sulcus (pSTS) to stimuli depicting social interactions between two agents, compared with (i) pairs of agents not interacting with each other, (ii) physical interactions between inanimate objects, and (iii) individual animate agents pursuing goals and interacting with inanimate objects. We further show that this region contains information about the nature of the social interaction-specifically, whether one agent is helping or hindering the other. This sensitivity to social interactions is strongest in a specific subregion of the pSTS but extends to a lesser extent into nearby regions previously implicated in theory of mind and dynamic face perception. This sensitivity to the presence and nature of social interactions is not easily explainable in terms of low-level visual features, attention, or the animacy, actions, or goals of individual agents. This region may underlie our ability to understand the structure of our social world and navigate within it.
Project description:BACKGROUND: How do we estimate time when watching an action? The idea that events are timed by a centralized clock has recently been called into question in favour of distributed, specialized mechanisms. Here we provide evidence for a critical specialization: animate and inanimate events are separately timed by humans. METHODOLOGY/PRINCIPAL FINDINGS: In different experiments, observers were asked to intercept a moving target or to discriminate the duration of a stationary flash while viewing different scenes. Time estimates were systematically shorter in the sessions involving human characters moving in the scene than in those involving inanimate moving characters. Remarkably, the animate/inanimate context also affected randomly intermingled trials which always depicted the same still character. CONCLUSIONS/SIGNIFICANCE: The existence of distinct time bases for animate and inanimate events might be related to the partial segregation of the neural networks processing these two categories of objects, and could enhance our ability to predict critically timed actions.
Project description:The anterior inferotemporal cortex (IT) is the highest stage along the hierarchy of visual areas that, in primates, processes visual objects. Although several lines of evidence suggest that IT primarily represents visual shape information, some recent studies have argued that neuronal ensembles in IT code the semantic membership of visual objects (i.e., represent conceptual classes such as animate and inanimate objects). In this study, we investigated to what extent semantic, rather than purely visual information, is represented in IT by performing a multivariate analysis of IT responses to a set of visual objects. By relying on a variety of machine-learning approaches (including a cutting-edge clustering algorithm that has been recently developed in the domain of statistical physics), we found that, in most instances, IT representation of visual objects is accounted for by their similarity at the level of shape or, more surprisingly, low-level visual properties. Only in a few cases we observed IT representations of semantic classes that were not explainable by the visual similarity of their members. Overall, these findings reassert the primary function of IT as a conveyor of explicit visual shape information, and reveal that low-level visual properties are represented in IT to a greater extent than previously appreciated. In addition, our work demonstrates how combining a variety of state-of-the-art multivariate approaches, and carefully estimating the contribution of shape similarity to the representation of object categories, can substantially advance our understanding of neuronal coding of visual objects in cortex.