Human-Object Interactions Are More than the Sum of Their Parts.
ABSTRACT: Understanding human-object interactions is critical for extracting meaning from everyday visual scenes and requires integrating complex relationships between human pose and object identity into a new percept. To understand how the brain builds these representations, we conducted 2 fMRI experiments in which subjects viewed humans interacting with objects, noninteracting human-object pairs, and isolated humans and objects. A number of visual regions process features of human-object interactions, including object identity information in the lateral occipital complex (LOC) and parahippocampal place area (PPA), and human pose information in the extrastriate body area (EBA) and posterior superior temporal sulcus (pSTS). Representations of human-object interactions in some regions, such as the posterior PPA (retinotopic maps PHC1 and PHC2) are well predicted by a simple linear combination of the response to object and pose information. Other regions, however, especially pSTS, exhibit representations for human-object interaction categories that are not predicted by their individual components, indicating that they encode human-object interactions as more than the sum of their parts. These results reveal the distributed networks underlying the emergent representation of human-object interactions necessary for social perception.
Project description:Knowledge of objects in the world is stored in our brains as rich, multimodal representations. Because the neural pathways that process this diverse sensory information are largely anatomically distinct, a fundamental challenge to cognitive neuroscience is to explain how the brain binds the different sensory features that comprise an object to form meaningful, multimodal object representations. Studies with nonhuman primates suggest that a structure at the culmination of the object recognition system (the perirhinal cortex) performs this critical function. In contrast, human neuroimaging studies implicate the posterior superior temporal sulcus (pSTS). The results of the functional MRI study reported here resolve this apparent discrepancy by demonstrating that both pSTS and the perirhinal cortex contribute to crossmodal binding in humans, but in different ways. Significantly, only perirhinal cortex activity is modulated by meaning variables (e.g., semantic congruency and semantic category), suggesting that these two regions play complementary functional roles, with pSTS acting as a presemantic, heteromodal region for crossmodal perceptual features, and perirhinal cortex integrating these features into higher-level conceptual representations. This interpretation is supported by the results of our behavioral study: Patients with lesions, including the perirhinal cortex, but not patients with damage restricted to frontal cortex, were impaired on the same crossmodal integration task, and their performance was significantly influenced by the same semantic factors, mirroring the functional MRI findings. These results integrate nonhuman and human primate research by providing converging evidence that human perirhinal cortex is also critically involved in processing meaningful aspects of multimodal object representations.
Project description:Real-world visual scenes are complex cluttered, and heterogeneous stimuli engaging scene- and object-selective cortical regions including parahippocampal place area (PPA), retrosplenial complex (RSC), and lateral occipital complex (LOC). To understand the unique contribution of each region to distributed scene representations, we generated predictions based on a neuroanatomical framework adapted from monkey and tested them using minimal scenes in which we independently manipulated both spatial layout (open, closed, and gradient) and object content (furniture, e.g., bed, dresser). Commensurate with its strong connectivity with posterior parietal cortex, RSC evidenced strong spatial layout information but no object information, and its response was not even modulated by object presence. In contrast, LOC, which lies within the ventral visual pathway, contained strong object information but no background information. Finally, PPA, which is connected with both the dorsal and the ventral visual pathway, showed information about both objects and spatial backgrounds and was sensitive to the presence or absence of either. These results suggest that 1) LOC, PPA, and RSC have distinct representations, emphasizing different aspects of scenes, 2) the specific representations in each region are predictable from their patterns of connectivity, and 3) PPA combines both spatial layout and object information as predicted by connectivity.
Project description:Recent behavioural evidence shows that visual displays of two individuals interacting are not simply encoded as separate individuals, but as an interactive unit that is 'more than the sum of its parts'. Recent functional magnetic resonance imaging (fMRI) evidence shows the importance of the posterior superior temporal sulcus (pSTS) in processing human social interactions, and suggests that it may represent human-object interactions as qualitatively 'greater' than the average of their constituent parts. The current study aimed to investigate whether the pSTS or other posterior temporal lobe region(s): 1) Demonstrated evidence of a dyadic information effect - that is, qualitatively different responses to an interacting dyad than to averaged responses of the same two interactors, presented in isolation, and; 2) Significantly differentiated between different types of social interactions. Multivoxel pattern analysis was performed in which a classifier was trained to differentiate between qualitatively different types of dyadic interactions. Above-chance classification of interactions was observed in 'interaction selective' pSTS-I and extrastriate body area (EBA), but not in other regions of interest (i.e. face-selective STS and mentalizing-selective temporo-parietal junction). A dyadic information effect was not observed in the pSTS-I, but instead was shown in the EBA; that is, classification of dyadic interactions did not fully generalise to averaged responses to the isolated interactors, indicating that dyadic representations in the EBA contain unique information that cannot be recovered from the interactors presented in isolation. These findings complement previous observations for congruent grouping of human bodies and objects in the broader lateral occipital temporal cortex area.
Project description:We internally represent the structure of our surroundings even when there is little layout information available in the visual image, such as when walking through fog or darkness. One way in which we disambiguate such scenes is through object cues; for example, seeing a boat supports the inference that the foggy scene is a lake. Recent studies have investigated the neural mechanisms by which object and scene processing interact to support object perception. The current study examines the reverse interaction by which objects facilitate the neural representation of scene layout. Photographs of indoor (closed) and outdoor (open) real-world scenes were blurred such that they were difficult to categorize on their own but easily disambiguated by the inclusion of an object. fMRI decoding was used to measure scene representations in scene-selective parahippocampal place area (PPA) and occipital place area (OPA). Classifiers were trained to distinguish response patterns to fully visible indoor and outdoor scenes, presented in an independent experiment. Testing these classifiers on blurred scenes revealed a strong improvement in classification in left PPA and OPA when objects were present, despite the reduced low-level visual feature overlap with the training set in this condition. These findings were specific to left PPA/OPA, with no evidence for object-driven facilitation in right PPA/OPA, object-selective areas, and early visual cortex. These findings demonstrate separate roles for left and right scene-selective cortex in scene representation, whereby left PPA/OPA represents inferred scene layout, influenced by contextual object cues, and right PPA/OPA represents a scene's visual features.
Project description:Primary progressive aphasia (PPA), a selective neurodegeneration of the language network, frequently causes object naming impairments. We examined the N400 event-related potential (ERP) to explore interactions between object recognition and word processing in 20 PPA patients and 15 controls. Participants viewed photographs of objects, each followed by a word that was either a match to the object, a semantically related mismatch, or an unrelated mismatch. Patients judged whether word-object pairs matched with high accuracy (94% PPA group; 98% control group), but they failed to exhibit the normal N400 category effect (N400c), defined as a larger N400 to unrelated versus related mismatch words. In contrast, the N400 mismatch effect (N400m), defined as a larger N400 to mismatch than match words, was observed in both groups. N400m magnitude was positively correlated with neuropsychological measures of word comprehension but not fluency or grammatical competence, and therefore reflected the semantic component of naming. After ERP testing, patients were asked to name the same set of objects aloud. Trials with objects that could not be named were found to lack an N400m, although the name had been correctly recognized at the matching stage. Even accurate overt naming did not necessarily imply normal semantic processing, as shown by the absent N400c. The N400m was preserved in one patient with postsemantic anomia, who could write the names of objects she could not verbalize. N400 analyses can thus help dissect the multiple cognitive mechanisms that contribute to object naming failures in PPA.
Project description:One of the major lessons of memory research has been that human memory is fallible, imprecise, and subject to interference. Thus, although observers can remember thousands of images, it is widely assumed that these memories lack detail. Contrary to this assumption, here we show that long-term memory is capable of storing a massive number of objects with details from the image. Participants viewed pictures of 2,500 objects over the course of 5.5 h. Afterward, they were shown pairs of images and indicated which of the two they had seen. The previously viewed item could be paired with either an object from a novel category, an object of the same basic-level category, or the same object in a different state or pose. Performance in each of these conditions was remarkably high (92%, 88%, and 87%, respectively), suggesting that participants successfully maintained detailed representations of thousands of images. These results have implications for cognitive models, in which capacity limitations impose a primary computational constraint (e.g., models of object recognition), and pose a challenge to neural models of memory storage and retrieval, which must be able to account for such a large and detailed storage capacity.
Project description:The brain represents visual objects with topographic cortical patterns. To address how distributed visual representations enable object categorization, we established predictive encoding models based on a deep residual network, and trained them to predict cortical responses to natural movies. Using this predictive model, we mapped human cortical representations to 64,000 visual objects from 80 categories with high throughput and accuracy. Such representations covered both the ventral and dorsal pathways, reflected multiple levels of object features, and preserved semantic relationships between categories. In the entire visual cortex, object representations were organized into three clusters of categories: biological objects, non-biological objects, and background scenes. In a finer scale specific to each cluster, object representations revealed sub-clusters for further categorization. Such hierarchical clustering of category representations was mostly contributed by cortical representations of object features from middle to high levels. In summary, this study demonstrates a useful computational strategy to characterize the cortical organization and representations of visual features for rapid categorization.
Project description:Object recognition is challenging because each object produces myriad retinal images. Responses of neurons from the inferior temporal cortex (IT) are selective to different objects, yet tolerant ("invariant") to changes in object position, scale, and pose. How does the brain construct this neuronal tolerance? We report a form of neuronal learning that suggests the underlying solution. Targeted alteration of the natural temporal contiguity of visual experience caused specific changes in IT position tolerance. This unsupervised temporal slowness learning (UTL) was substantial, increased with experience, and was significant in single IT neurons after just 1 hour. Together with previous theoretical work and human object perception experiments, we speculate that UTL may reflect the mechanism by which the visual stream builds and maintains tolerant object representations.
Project description:We used functional magnetic resonance imaging (fMRI) to demonstrate the existence of a mechanism in the human lateral occipital (LO) cortex that supports recognition of real-world visual scenes through parallel analysis of within-scene objects. Neural activity was recorded while subjects viewed four categories of scenes and eight categories of 'signature' objects strongly associated with the scenes in three experiments. Multivoxel patterns evoked by scenes in the LO cortex were well predicted by the average of the patterns elicited by their signature objects. By contrast, there was no relationship between scene and object patterns in the parahippocampal place area (PPA), even though this region responds strongly to scenes and is believed to be crucial for scene identification. By combining information about multiple objects within a scene, the LO cortex may support an object-based channel for scene recognition that complements the processing of global scene properties in the PPA.
Project description:One key ability of human brain is invariant object recognition, which refers to rapid and accurate recognition of objects in the presence of variations such as size, rotation and position. Despite decades of research into the topic, it remains unknown how the brain constructs invariant representations of objects. Providing brain-plausible object representations and reaching human-level accuracy in recognition, hierarchical models of human vision have suggested that, human brain implements similar feed-forward operations to obtain invariant representations. However, conducting two psychophysical object recognition experiments on humans with systematically controlled variations of objects, we observed that humans relied on specific (diagnostic) object regions for accurate recognition which remained relatively consistent (invariant) across variations; but feed-forward feature-extraction models selected view-specific (non-invariant) features across variations. This suggests that models can develop different strategies, but reach human-level recognition performance. Moreover, human individuals largely disagreed on their diagnostic features and flexibly shifted their feature extraction strategy from view-invariant to view-specific when objects became more similar. This implies that, even in rapid object recognition, rather than a set of feed-forward mechanisms which extract diagnostic features from objects in a hard-wired fashion, the bottom-up visual pathways receive, through top-down connections, task-related information possibly processed in prefrontal cortex.