Cardiac Concomitants of Feedback and Prediction Error Processing in Reinforcement Learning.
ABSTRACT: Successful learning hinges on the evaluation of positive and negative feedback. We assessed differential learning from reward and punishment in a monetary reinforcement learning paradigm, together with cardiac concomitants of positive and negative feedback processing. On the behavioral level, learning from reward resulted in more advantageous behavior than learning from punishment, suggesting a differential impact of reward and punishment on successful feedback-based learning. On the autonomic level, learning and feedback processing were closely mirrored by phasic cardiac responses on a trial-by-trial basis: (1) Negative feedback was accompanied by faster and prolonged heart rate deceleration compared to positive feedback. (2) Cardiac responses shifted from feedback presentation at the beginning of learning to stimulus presentation later on. (3) Most importantly, the strength of phasic cardiac responses to the presentation of feedback correlated with the strength of prediction error signals that alert the learner to the necessity for behavioral adaptation. Considering participants' weight status and gender revealed obesity-related deficits in learning to avoid negative consequences and less consistent behavioral adaptation in women compared to men. In sum, our results provide strong new evidence for the notion that during learning phasic cardiac responses reflect an internal value and feedback monitoring system that is sensitive to the violation of performance-based expectations. Moreover, inter-individual differences in weight status and gender may affect both behavioral and autonomic responses in reinforcement-based learning.
Project description:Impairments in flexible goal-directed decisions, often examined by reversal learning, are associated with behavioral abnormalities characterized by impulsiveness and disinhibition. Although the lateral orbital frontal cortex (OFC) has been consistently implicated in reversal learning, it is still unclear whether this region is involved in negative feedback processing, behavioral control, or both, and whether reward and punishment might have different effects on lateral OFC involvement. Using a relatively large sample (N?=?47), and a categorical learning task with either monetary reward or moderate electric shock as feedback, we found overlapping activations in the right lateral OFC (and adjacent insula) for reward and punishment reversal learning when comparing correct reversal trials with correct acquisition trials, whereas we found overlapping activations in the right dorsolateral prefrontal cortex (DLPFC) when negative feedback signaled contingency change. The right lateral OFC and DLPFC also showed greater sensitivity to punishment than did their left homologues, indicating an asymmetry in how punishment is processed. We propose that the right lateral OFC and anterior insula are important for transforming affective feedback to behavioral adjustment, whereas the right DLPFC is involved in higher level attention control. These results provide insight into the neural mechanisms of reversal learning and behavioral flexibility, which can be leveraged to understand risky behaviors among vulnerable populations.
Project description:Post-traumatic stress disorder (PTSD) symptoms include behavioral avoidance which is acquired and tends to increase with time. This avoidance may represent a general learning bias; indeed, individuals with PTSD are often faster than controls on acquiring conditioned responses based on physiologically-aversive feedback. However, it is not clear whether this learning bias extends to cognitive feedback, or to learning from both reward and punishment. Here, male veterans with self-reported current, severe PTSD symptoms (PTSS group) or with few or no PTSD symptoms (control group) completed a probabilistic classification task that included both reward-based and punishment-based trials, where feedback could take the form of reward, punishment, or an ambiguous "no-feedback" outcome that could signal either successful avoidance of punishment or failure to obtain reward. The PTSS group outperformed the control group in total points obtained; the PTSS group specifically performed better than the control group on reward-based trials, with no difference on punishment-based trials. To better understand possible mechanisms underlying observed performance, we used a reinforcement learning model of the task, and applied maximum likelihood estimation techniques to derive estimated parameters describing individual participants' behavior. Estimations of the reinforcement value of the no-feedback outcome were significantly greater in the control group than the PTSS group, suggesting that the control group was more likely to value this outcome as positively reinforcing (i.e., signaling successful avoidance of punishment). This is consistent with the control group's generally poorer performance on reward trials, where reward feedback was to be obtained in preference to the no-feedback outcome. Differences in the interpretation of ambiguous feedback may contribute to the facilitated reinforcement learning often observed in PTSD patients, and may in turn provide new insight into how pathological behaviors are acquired and maintained in PTSD.
Project description:Both positive psychotic symptoms and anhedonia are associated with striatal functioning, but few studies have linked risk for psychotic disorders to a neural measure evoked during a striatal dopamine-related reward and punishment-based learning task, such as a reversal learning task (RLT; Cools et al, 2009). The feedback-related negativity (FRN) is a neural response that in part reflects striatal dopamine functioning. We recorded EEG during the RLT in three groups: (a) people with psychotic experiences (PE; n=20) at increased risk for psychotic disorders; (b) people with extremely elevated social anhedonia (SocAnh; n=22); and (c) controls (n=20). Behaviorally, consistent with increased striatal dopamine, the PE group exhibited better behavioral learning (ie, faster responses) after unexpected reward than after unexpected punishment. Moreover, although the control and SocAnh groups showed a larger FRN to punishment than reward, the PE group showed similar FRNs to punishment and reward, with a numerically larger FRN to reward than punishment (with similar results on these trials also found for a P3a component). These results are among the first to link a neural response evoked by a reward and punishment-based learning task specifically with elevated psychosis risk.
Project description:Prior research has shown that the ratio between resting-state theta (4-7 Hz)-beta (13-30 Hz) oscillations in the electroencephalogram (EEG) is associated with reward- and punishment-related feedback learning and risky decision making. However, it remains unclear whether the theta/beta EEG ratio is also an electrophysiological index for poorer behavioral adaptation when reward and punishment contingencies change over time. The aim of the present study was to investigate whether resting-state theta (4-7 Hz)-beta (13-30 Hz) EEG ratio correlated with reversal learning. A 4-min resting-state EEG was recorded and a gambling task with changing reward-punishment contingencies was administered in 128 healthy volunteers. Results showed an inverse relationship between theta/beta EEG ratio and reversal learning. Our findings replicate and extend previous findings by showing that higher midfrontal theta/beta EEG ratios are associated with poorer reversal learning and behavioral adaptive responses under changing environmental demands.
Project description:RATIONALE:During value-based decision-making, organisms make choices on the basis of reward expectations, which have been formed during prior action-outcome learning. Although it is known that neuronal manipulations of different subregions of the rat prefrontal cortex (PFC) have qualitatively different effects on behavioral tasks involving value-based decision-making, it is unclear how these regions contribute to the underlying component processes. OBJECTIVES:Assessing how different regions of the rodent PFC contribute to component processes of value-based decision-making behavior, including reward (or positive feedback) learning, punishment (or negative feedback) learning, response persistence, and exploration versus exploitation. METHODS:We performed behavioral modeling of data of rats in a probabilistic reversal learning task after pharmacological inactivation of five PFC subregions, to assess how inactivation of these different regions affected the structure of responding of animals in the task. RESULTS:Our results show reductions in reward and punishment learning after PFC subregion inactivation. The prelimbic, infralimbic, lateral orbital, and medial orbital PFC particularly contributed to punishment learning, and the prelimbic and lateral orbital PFC to reward learning. In addition, response persistence depended on the infralimbic and medial orbital PFC. As a result, pharmacological inactivation of the infralimbic and lateral orbitofrontal cortex reduced the number of reversals achieved, whereas inactivation of the prelimbic and medial orbitofrontal cortex decreased the number of rewards obtained. Finally, using simulated data, we explain discrepancies with a previous study and demonstrate complex, interacting relationships between conventional measures of probabilistic reversal learning performance, such as win-stay/lose-switch behavior, and component processes of value-based decision-making. CONCLUSIONS:Together, our data suggest that distinct components of value-based learning and decision-making are generated in medial and orbital PFC regions, displaying functional specialization and overlap, with a prominent role of large parts of the PFC in negative feedback processing.
Project description:It is well established that Parkinson's disease leads to impaired learning from reward and enhanced learning from punishment. The administration of dopaminergic medications reverses this learning pattern. However, few studies have investigated the neural underpinnings of these cognitive processes. In this study, using fMRI, we tested a group of Parkinson's disease patients on and off dopaminergic medications and matched healthy individuals. All individuals completed an fMRI cognitive task that dissociates feedback learning from reward versus punishment. The administration of dopaminergic medications attenuated blood oxygen level dependent (BOLD) responses to punishment in the bilateral putamen, in bilateral dorsolateral prefrontal cortex and the left premotor cortex. Further, the administration of dopaminergic medications resulted in a higher ratio of BOLD activity between reward and punishment trials in these brain areas. BOLD activity in these brain areas was significantly correlated with learning from punishment, but not from reward trials. Furthermore, the administration of dopaminergic medications altered BOLD activity in the right insula and ventromedial prefrontal cortex when Parkinson's disease patients were anticipating feedback. These findings are in agreement with a large body of literature indicating that Parkinson's disease is associated with enhanced learning from punishment. However, it was surprising that dopaminergic medications modulated punishment learning as opposed to reward learning, although reward learning has been directly linked to dopaminergic function. We argue that these results might be attributed to both a change in the balance between direct and indirect pathway activation in the basal ganglia as well as the differential activity of D1 versus D2 dopamine receptors.
Project description:Reward and punishment are potent modulators of associative learning in instrumental and classical conditioning. However, the effect of reward and punishment on procedural learning is not known. The striatum is known to be an important locus of reward-related neural signals and part of the neural substrate of procedural learning. Here, using an implicit motor learning task, we show that reward leads to enhancement of learning in human subjects, whereas punishment is associated only with improvement in motor performance. Furthermore, these behavioral effects have distinct neural substrates with the learning effect of reward being mediated through the dorsal striatum and the performance effect of punishment through the insula. Our results suggest that reward and punishment engage separate motivational systems with distinctive behavioral effects and neural substrates.
Project description:In real-world settings, learning is often characterised as intentional: learners are aware of the goal during the learning process, and the goal of learning is readily dissociable from the awareness of what is learned. Recent evidence has shown that reward and punishment (collectively referred to as valenced feedback) are important factors that influence performance during learning. Presently, however, studies investigating the impact of valenced feedback on skill learning have only considered unintentional learning, and therefore the interaction between intentionality and valenced feedback has not been systematically examined. The present study investigated how reward and punishment impact behavioural performance when participants are instructed to learn in a goal-directed fashion (i.e. intentionally) rather than unintentionally. In Experiment 1, participants performed the serial response time task with reward, punishment, or control feedback and were instructed to ignore the presence of the sequence, i.e., learn unintentionally. Experiment 2 followed the same design, but participants were instructed to intentionally learn the sequence. We found that punishment significantly benefitted performance during learning only when participants learned unintentionally, and we observed no effect of punishment when participants learned intentionally. Thus, the impact of feedback on performance may be influenced by goal of the learner.
Project description:Dopamine (DA) is critical for reward processing, but significantly less is known about its role in punishment avoidance. Using a combined approach-avoidance task, we measured phasic DA release in the nucleus accumbens (NAc) of rats during presentation of cues that predicted reward, punishment or neutral outcomes and investigated individual differences based on avoidance performance. Here we show that DA release within a single microenvironment is higher for reward and avoidance cues compared with neutral cues and positively correlated with poor avoidance behaviour. We found that DA release delineates trial-type during sessions with good avoidance but is non-selective during poor avoidance, with high release correlating with poor performance. These data demonstrate that phasic DA is released during cued approach and avoidance within the same microenvironment and abnormal processing of value signals is correlated with poor performance.
Project description:Adolescence is a period of life characterised by changes in learning and decision-making. Learning and decision-making do not rely on a unitary system, but instead require the coordination of different cognitive processes that can be mathematically formalised as dissociable computational modules. Here, we aimed to trace the developmental time-course of the computational modules responsible for learning from reward or punishment, and learning from counterfactual feedback. Adolescents and adults carried out a novel reinforcement learning paradigm in which participants learned the association between cues and probabilistic outcomes, where the outcomes differed in valence (reward versus punishment) and feedback was either partial or complete (either the outcome of the chosen option only, or the outcomes of both the chosen and unchosen option, were displayed). Computational strategies changed during development: whereas adolescents' behaviour was better explained by a basic reinforcement learning algorithm, adults' behaviour integrated increasingly complex computational features, namely a counterfactual learning module (enabling enhanced performance in the presence of complete feedback) and a value contextualisation module (enabling symmetrical reward and punishment learning). Unlike adults, adolescent performance did not benefit from counterfactual (complete) feedback. In addition, while adults learned symmetrically from both reward and punishment, adolescents learned from reward but were less likely to learn from punishment. This tendency to rely on rewards and not to consider alternative consequences of actions might contribute to our understanding of decision-making in adolescence.