Separable learning systems in the macaque brain and the role of orbitofrontal cortex in contingent learning.
ABSTRACT: Orbitofrontal cortex (OFC) is widely held to be critical for flexibility in decision-making when established choice values change. OFC's role in such decision making was investigated in macaques performing dynamically changing three-armed bandit tasks. After selective OFC lesions, animals were impaired at discovering the identity of the highest value stimulus following reversals. However, this was not caused either by diminished behavioral flexibility or by insensitivity to reinforcement changes, but instead by paradoxical increases in switching between all stimuli. This pattern of choice behavior could be explained by a causal role for OFC in appropriate contingent learning, the process by which causal responsibility for a particular reward is assigned to a particular choice. After OFC lesions, animals' choice behavior no longer reflected the history of precise conjoint relationships between particular choices and particular rewards. Nonetheless, OFC-lesioned animals could still approximate choice-outcome associations using a recency-weighted history of choices and rewards.
Project description:We make choices based on the values of expected outcomes, informed by previous experience in similar settings. When the outcomes of our decisions consistently violate expectations, new learning is needed to maximize rewards. Yet not every surprising event indicates a meaningful change in the environment. Even when conditions are stable overall, outcomes of a single experience can still be unpredictable due to small fluctuations (i.e., expected uncertainty) in reward or costs. In the present work, we investigate causal contributions of the basolateral amygdala (BLA) and orbitofrontal cortex (OFC) in rats to learning under expected outcome uncertainty in a novel delay-based task that incorporates both predictable fluctuations and directional shifts in outcome values. We demonstrate that OFC is required to accurately represent the distribution of wait times to stabilize choice preferences despite trial-by-trial fluctuations in outcomes, whereas BLA is necessary for the facilitation of learning in response to surprising events.
Project description:Orbitofrontal cortex (OFC), medial frontal cortex (MFC), and amygdala mediate stimulus-reward learning, but the mechanisms through which they interact are unclear. Here, we investigated how neurons in macaque OFC and MFC signaled rewards and the stimuli that predicted them during learning with and without amygdala input. Macaques performed a task that required them to evaluate two stimuli and then choose one to receive the reward associated with that option. Four main findings emerged. First, amygdala lesions slowed the acquisition and use of stimulus-reward associations. Further analyses indicated that this impairment was due, at least in part, to ineffective use of negative feedback to guide subsequent decisions. Second, the activity of neurons in OFC and MFC rapidly evolved to encode the amount of reward associated with each stimulus. Third, amygdalectomy reduced encoding of stimulus-reward associations during the evaluation of different stimuli. Reward encoding of anticipated and received reward after choices were made was not altered. Fourth, amygdala lesions led to an increase in the proportion of neurons in MFC, but not OFC, that encoded the instrumental response that monkeys made on each trial. These correlated changes in behavior and neural activity after amygdala lesions strongly suggest that the amygdala contributes to the ability to learn stimulus-reward associations rapidly by shaping encoding within OFC and MFC.SIGNIFICANCE STATEMENT Altered functional interactions among orbital frontal cortex (OFC), medial frontal cortex (MFC), and amygdala are thought to underlie several psychiatric conditions, many related to reward learning. Here, we investigated the causal contribution of the amygdala to the development of neuronal activity in macaque OFC and MFC related to rewards and the stimuli that predict them during learning. Without amygdala inputs, neurons in both OFC and MFC showed decreased encoding of stimulus-reward associations. MFC also showed increased encoding of the instrumental responses that monkeys made on each trial. Behaviorally, changes in neural activity were accompanied by slower stimulus-reward learning. The findings suggest that interactions among amygdala, OFC, and MFC contribute to learning about stimuli that predict rewards.
Project description:Behavioral inflexibility is a common symptom of neuropsychiatric disorders which can have a major detrimental impact on quality of life. While the orbitofrontal cortex (OFC) has been strongly implicated in behavioral flexibility in rodents across paradigms, our understanding of how the OFC mediates these behaviors is rapidly adapting. Here we examined neuronal activity during reversal learning by coupling in vivo electrophysiological recording with a mouse touch-screen learning paradigm to further elucidate the role of the OFC in updating reward value. Single unit and oscillatory activity was recorded during well-learned discrimination and 3 distinct phases of reversal (early, chance and well-learned). During touch-screen performance, OFC neuronal firing tracked rewarded responses following a previous rewarded choice when behavior was well learned, but shifted to primarily track repeated errors following a previous error in early reversal. Spike activity tracked rewarded choices independent of previous trial outcome during chance reversal, and returned to the initial pattern of reward response at criterion. Analysis of spike coupling to oscillatory local field potentials showed that less frequently occurring behaviors had significantly fewer neurons locked to any oscillatory frequency. Together, these data support the role of the OFC in tracking the value of individual choices to inform future responses and suggests that oscillatory signaling may be involved in propagating responses to increase or decrease the likelihood that action is taken in the future. They further support the use of touch-screen paradigms in preclinical studies to more closely model clinical approaches to measuring behavioral flexibility.
Project description:We examined the contribution of the amygdala to value signals within orbital prefrontal cortex (OFC) and medial prefrontal cortex (MFC). On each trial, monkeys chose between two stimuli that were associated with different quantities of reward. In intact monkeys, as expected, neurons in both OFC and MFC signaled the reward quantity associated with stimuli. Contrasted with MFC, OFC contained a larger proportion of neurons encoding reward quantity and did so with faster response latencies. Removing the amygdala eliminated these differences, mainly by decreasing value coding in OFC. Similar decreases occurred in OFC immediately before and after reward delivery. Although the amygdala projects to both OFC and MFC, we found that it has its greatest influence over reward-value coding in OFC. Notably, amygdala lesions did not abolish value coding in OFC, which shows that OFC's representations of the value of objects, choices, and outcomes depends, in large part, on other sources.
Project description:The ability to select an appropriate behavioral response guided by previous emotional experiences is critical for survival. Although much is known about brain mechanisms underlying emotional associations, little is known about how these associations guide behavior when several choices are available. To address this, we performed local pharmacological inactivations of several cortical regions before retrieval of an aversive memory in choice-based versus no-choice-based conditioned taste aversion (CTA) tasks in rats. Interestingly, we found that inactivation of the orbitofrontal cortex (OFC), but not the dorsal or ventral medial prefrontal cortices, blocked retrieval of choice CTA. However, OFC inactivation left retrieval of no-choice CTA intact, suggesting its role in guiding choice, but not in retrieval of CTA memory. Consistently, OFC activity increased in the choice condition compared with no-choice, as measured with c-Fos immunolabeling. Notably, OFC inactivation did not affect choice behavior when it was guided by innate taste aversion. Consistent with an anterior insular cortex (AIC) involvement in storing taste memories, we found that AIC inactivation impaired retrieval of both choice and no-choice CTA. Therefore, this study provides evidence for OFC's role in guiding choice behavior and shows that this is dissociable from AIC-dependent taste aversion memory. Together, our results suggest that OFC is required and recruited to guide choice selection between options of taste associations relayed from AIC. SIGNIFICANCE STATEMENT:Survival and mental health depend on being able to choose stimuli not associated with danger. This is particularly important when danger is associated with stimuli that we ingest. Although much is known about the brain mechanisms that underlie associations with dangerous taste stimuli, very little is known about how these stored emotional associations guide behavior when it involves choice. By combining pharmacological and immunohistochemistry tools with taste-guided tasks, our study provides evidence for the key role of orbitofrontal cortex activity in choice behavior and shows that this is dissociable from the adjacent insular cortex-dependent taste aversion memory. Understanding the brain mechanisms that underlie the impact that emotional associations have on survival choice behaviors may lead to better treatments for mental disorders characterized by emotional decision-making deficits.
Project description:Outcome-guided behavior requires knowledge about the current value of expected outcomes. Such behavior can be isolated in the reinforcer devaluation task, which assesses the ability to infer the current value of specific rewards after devaluation. Animal lesion studies demonstrate that orbitofrontal cortex (OFC) is necessary for normal behavior in this task, but a causal role for human OFC in outcome-guided behavior has not been established. Here, we used sham-controlled, non-invasive, continuous theta-burst stimulation (cTBS) to temporarily disrupt human OFC network activity by stimulating a site in the lateral prefrontal cortex that is strongly connected to OFC prior to devaluation of food odor rewards. Subjects in the sham group appropriately avoided Pavlovian cues associated with devalued food odors. However, subjects in the stimulation group persistently chose those cues, even though devaluation of food odors themselves was unaffected by cTBS. This behavioral impairment was mirrored in changes in resting-state functional magnetic resonance imaging (rs-fMRI) activity such that subjects in the stimulation group exhibited reduced OFC network connectivity after cTBS, and the magnitude of this reduction was correlated with choices after devaluation. These findings demonstrate the feasibility of indirectly targeting the human OFC with non-invasive cTBS and indicate that OFC is specifically required for inferring the value of expected outcomes.
Project description:To successfully evaluate potential courses of action and choose the most favorable, we must consider the outcomes that may result. Many choices involve risk, our assessment of which may lead us to success or failure in matters financial, legal or health-related. The orbitofrontal cortex (OFC) has been implicated as critical for evaluating choices based on risk. To measure how outcomes of risky decisions are represented in the OFC, we recorded the electrophysiological activity of single neurons while rats made behavioral responses to obtain rewards under conditions of either certainty or risk. Rats exhibited different risk-preferences when given the opportunity to choose. In risk-preferring rats, OFC responses were enhanced following the delivery of large rewards obtained under risk compared with smaller, certain rewards and reward omission. However, in risk-neutral rats, neurons showed similarly enhanced responses to both large rewards obtained under risk and smaller, certain rewards compared with reward omission. Thus, the responses of OFC neurons reflected the subjective evaluation of outcomes in individuals with different risk-preferences. Such enhanced neural responding to risky rewards may serve to bias individuals towards risk-preference in decision-making.
Project description:Memory can inform goal-directed behavior by linking current opportunities to past outcomes. The orbitofrontal cortex (OFC) may guide value-based responses by integrating the history of stimulus-reward associations into expected outcomes, representations of predicted hedonic value and quality. Alternatively, the OFC may rapidly compute flexible "online" reward predictions by associating stimuli with the latest outcome. OFC neurons develop predictive codes when rats learn to associate arbitrary stimuli with outcomes, but the extent to which predictive coding depends on most recent events and the integrated history of rewards is unclear. To investigate how reward history modulates OFC activity, we recorded OFC ensembles as rats performed spatial discriminations that differed only in the number of rewarded trials between goal reversals. The firing rate of single OFC neurons distinguished identical behaviors guided by different goals. When >20 rewarded trials separated goal switches, OFC ensembles developed stable and anticorrelated population vectors that predicted overall choice accuracy and the goal selected in single trials. When <10 rewarded trials separated goal switches, OFC population vectors decorrelated rapidly after each switch, but did not develop anticorrelated firing patterns or predict choice accuracy. The results show that, whereas OFC signals respond rapidly to contingency changes, they predict choices only when reward history is relatively stable, suggesting that consecutive rewarded episodes are needed for OFC computations that integrate reward history into expected outcomes.SIGNIFICANCE STATEMENT Adapting to changing contingencies and making decisions engages the orbitofrontal cortex (OFC). Previous work shows that OFC function can either improve or impair learning depending on reward stability, suggesting that OFC guides behavior optimally when contingencies apply consistently. The mechanisms that link reward history to OFC computations remain obscure. Here, we examined OFC unit activity as rodents performed tasks controlled by contingencies that varied reward history. When contingencies were stable, OFC neurons signaled past, present, and pending events; when contingencies were unstable, past and present coding persisted, but predictive coding diminished. The results suggest that OFC mechanisms require stable contingencies across consecutive episodes to integrate reward history, represent predicted outcomes, and inform goal-directed choices.
Project description:In intertemporal choices between immediate and delayed rewards, people tend to prefer immediate rewards, often even when the delayed reward is larger. This is known as temporal discounting. It has been proposed that this tendency emerges because immediate rewards are more emotionally arousing than delayed rewards. However, in our previous research, we found no evidence for this but instead found that arousal responses (indexed with pupil dilation) in intertemporal choice are context-dependent. Specifically, arousal tracks the subjective value of the more variable reward option in the paradigm, whether it is immediate or delayed. Nevertheless, people tend to choose the less variable option in the choice task. In other words, their choices are reference-dependent and depend on variance in their recent history of offers. This suggests that there may be a causal relationship between reference-dependent choice and arousal, which we investigate here by reducing arousal pharmacologically using propranolol. Here, we show that propranolol reduces reference-dependence, leading to choices that are less influenced by recent history and more internally consistent.
Project description:Theories of reward learning in neuroscience have focused on two families of algorithms thought to capture deliberative versus habitual choice. 'Model-based' algorithms compute the value of candidate actions from scratch, whereas 'model-free' algorithms make choice more efficient but less flexible by storing pre-computed action values. We examine an intermediate algorithmic family, the successor representation, which balances flexibility and efficiency by storing partially computed action values: predictions about future events. These pre-computation strategies differ in how they update their choices following changes in a task. The successor representation's reliance on stored predictions about future states predicts a unique signature of insensitivity to changes in the task's sequence of events, but flexible adjustment following changes to rewards. We provide evidence for such differential sensitivity in two behavioural studies with humans. These results suggest that the successor representation is a computational substrate for semi-flexible choice in humans, introducing a subtler, more cognitive notion of habit.