Project description:We introduce the notion of reinforcement quantum annealing (RQA) scheme in which an intelligent agent searches in the space of Hamiltonians and interacts with a quantum annealer that plays the stochastic environment role of learning automata. At each iteration of RQA, after analyzing results (samples) from the previous iteration, the agent adjusts the penalty of unsatisfied constraints and re-casts the given problem to a new Ising Hamiltonian. As a proof-of-concept, we propose a novel approach for casting the problem of Boolean satisfiability (SAT) to Ising Hamiltonians and show how to apply the RQA for increasing the probability of finding the global optimum. Our experimental results on two different benchmark SAT problems (namely factoring pseudo-prime numbers and random SAT with phase transitions), using a D-Wave 2000Q quantum processor, demonstrated that RQA finds notably better solutions with fewer samples, compared to the best-known techniques in the realm of quantum annealing.
Project description:In the real world, many relationships between events are uncertain and probabilistic. Uncertainty is also likely to be a more common feature of daily experience for youth because they have less experience to draw from than adults. Some studies suggest probabilistic learning may be inefficient in youths compared to adults, while others suggest it may be more efficient in youths in mid adolescence. Here we used a probabilistic reinforcement learning task to test how youth age 8-17 (N = 187) and adults age 18-30 (N = 110) learn about stable probabilistic contingencies. Performance increased with age through early-twenties, then stabilized. Using hierarchical Bayesian methods to fit computational reinforcement learning models, we show that all participants' performance was better explained by models in which negative outcomes had minimal to no impact on learning. The performance increase over age was driven by 1) an increase in learning rate (i.e. decrease in integration time scale); 2) a decrease in noisy/exploratory choices. In mid-adolescence age 13-15, salivary testosterone and learning rate were positively related. We discuss our findings in the context of other studies and hypotheses about adolescent brain development.
Project description:Theoretical models of bipolar disorders (BD) posit core deficits in reward system function. However, specifying which among the multiple reward system's neurobehavioral processes are abnormal in BD is necessary to develop appropriately targeted interventions. Research on probabilistic-reinforcement learning deficits in BD is limited, particularly during adolescence, a period of significant neurodevelopmental changes in the reward system. The present study investigated probabilistic-reinforcement learning, using a probabilistic selection task (PST), and its correlates, using self-reported reward/threat sensitivities and cognitive tasks, in 104 adolescents with and without BD. Compared with healthy peers, adolescents with BD were less likely to persist with their choices based on prior positive feedback (i.e., lower win-stay rates) in the PST's acquisition phase. Across groups, a greater win-stay rate appeared to be a more efficient learning strategy-associated with fewer acquisition trials and better testing phase performance. Win-stay rates were also related to verbal learning indices, but not self-reported reward/threat sensitivities. Finally, lower win-stay rates had significant incremental validity in predicting a BD diagnosis, after accounting for effects of current symptoms, reward sensitivities, verbal learning, and IQ. The present findings support multiple dysfunctional processes of the reward system in adolescent BD that require additional examinations. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Project description:Despite progress in smoking reduction in the past several decades, cigarette smoking remains a significant public health concern world-wide, with many smokers attempting but ultimately failing to maintain abstinence. However, little is known about how decision-making evolves in quitting smokers. Based on preregistered hypotheses and analysis plan ( https://osf.io/yq5th ), we examined the evolution of reinforcement learning (RL), a key component of decision-making, in smokers during acute and extended nicotine abstinence. In a longitudinal, within-subject design, we used a probabilistic reward task (PRT) to assess RL in twenty smokers who successfully refrained from smoking for at least 30 days. We evaluated changes in reward-based decision-making using signal-detection analysis and five RL models across three sessions during 30 days of nicotine abstinence. Contrary to our preregistered hypothesis, punishment sensitivity emerged as the only parameter that changed during smoking cessation. While it is plausible that some changes in task performance could be attributed to task repetition effects, we observed a clear impact of the Nicotine Withdrawal Syndrome (NWS) on RL, and a dynamic relationship between craving and reward and punishment sensitivity over time, suggesting a significant recalibration of cognitive processes during abstinence. In this context, the heightened sensitivity to negative outcomes observed at the last session (30 days after quitting) compared to the previous sessions, may be interpreted as a cognitive adaptation aimed at fostering long-term abstinence. While further studies are needed to clarify the mechanisms underlying punishment sensitivity during nicotine abstinence, these results highlight the need for personalized treatment approaches tailored to individual needs.
Project description:Summary Electroactive Polymer (EAP) hydrogels are an active matter material used as actuators in soft robotics. Hydrogels exhibit active matter behavior through a form of memory and can be used to embody memory systems such as automata. This study exploited EAP responses, finding that EAP memory functions could be utilized for automaton and reservoir computing frameworks. Under sequential electrical stimulation, the mechanical responses of EAPs were represented in a probabilistic Moore automaton framework and expanded through shaping the reservoir’s energy landscape. The EAP automaton reservoir’s computational ability was compared with digital computation to assess EAPs as computational resources. We found that the computation in the EAP’s reaction to stimuli can be presented through automaton structures, revealing a potential bridge between EAP’s use as an integrated actuator and controller, i.e., our automaton framework could potentially lead to control systems wherein the computation was embedded into the media dynamical responses. Graphical abstract Highlights • EAP gel memory mechanics were demonstrated via voltage potential measurements• Probabilistic Moore automata were constructed from EAP gel responses to stimulation• Through tuning response encoding a computational reservoir was created• The reservoir was shown as more memory efficient than general digital alternatives Theoretical physics; Materials science; Polymers
Project description:BackgroundAnhedonia (a reduced experience of pleasure) and avolition (a reduction in goal-directed activity) are common features of schizophrenia that have substantial effects on functional outcome, but are poorly understood and treated. Here, we examined whether alterations in reinforcement learning may contribute to these symptoms in schizophrenia by impairing the translation of reward information into goal-directed action.Methods38 stable outpatients with schizophrenia or schizoaffective disorder and 37 healthy controls underwent fMRI during a probabilistic stimulus selection reinforcement learning task with dissociated choice- and feedback-related activation, followed by a behavioral transfer task allowing separate assessment of learning from positive versus negative outcomes. A Q-learning algorithm was used to examine functional activation relating to prediction error at the time of feedback and to expected value at the time of choice.ResultsBehavioral results suggested a reduction in learning from positive feedback in patients; however, this reduction was unrelated to anhedonia/avolition severity. On fMRI analysis, prediction error-related activation at the time of feedback was highly similar between patients and controls. During early learning, patients activated regions in the cognitive control network to a lesser extent than controls. Correlation analyses revealed reduced responses to positive feedback in dorsolateral prefrontal cortex and caudate among those patients higher in anhedonia/avolition.ConclusionsTogether, these results suggest that anhedonia/avolition are as strongly related to cortical learning or higher-level processes involved in goal-directed behavior such as effort computation and planning as to striatally mediated learning mechanisms.
Project description:Schizophrenia spectrum disorders (SZ) are characterized by impairments in probabilistic reinforcement learning (RL), which is associated with dopaminergic circuitry encompassing the prefrontal cortex and basal ganglia. However, there are no studies examining dopaminergic genes with respect to probabilistic RL in SZ. Thus, the aim of our study was to examine the impact of dopaminergic genes on performance assessed by the Probabilistic Selection Task (PST) in patients with SZ in comparison to healthy control (HC) subjects. In our study, we included 138 SZ patients and 188 HC participants. Genetic analysis was performed with respect to the following genetic polymorphisms: rs4680 in COMT, rs907094 in DARP-32, rs2734839, rs936461, rs1800497, and rs6277 in DRD2, rs747302 and rs1800955 in DRD4 and rs28363170 and rs2975226 in DAT1 genes. The probabilistic RL task was completed by 59 SZ patients and 95 HC subjects. SZ patients performed significantly worse in acquiring reinforcement contingencies during the task in comparison to HCs. We found no significant association between genetic polymorphisms and RL among SZ patients; however, among HC participants with respect to the DAT1 rs28363170 polymorphism, individuals with 10-allele repeat genotypes performed better in comparison to 9-allele repeat carriers. The present study indicates the relevance of the DAT1 rs28363170 polymorphism in RL in HC participants.
Project description:We present an algorithm for active learning of deterministic timed automata with a single clock. The algorithm is within the framework of Angluin’s
Project description:Rapid advancements in deep learning over the past decade have fueled an insatiable demand for efficient and scalable hardware. Photonics offers a promising solution by leveraging the unique properties of light. However, conventional neural network architectures, which typically require dense programmable connections, pose several practical challenges for photonic realizations. To overcome these limitations, we propose and experimentally demonstrate Photonic Neural Cellular Automata (PNCA) for photonic deep learning with sparse connectivity. PNCA harnesses the speed and interconnectivity of photonics, as well as the self-organizing nature of cellular automata through local interactions to achieve robust, reliable, and efficient processing. We utilize linear light interference and parametric nonlinear optics for all-optical computations in a time-multiplexed photonic network to experimentally perform self-organized image classification. We demonstrate binary (two-class) classification of images using as few as 3 programmable photonic parameters, achieving high experimental accuracy with the ability to also recognize out-of-distribution data. The proposed PNCA approach can be adapted to a wide range of existing photonic hardware and provides a compelling alternative to conventional photonic neural networks by maximizing the advantages of light-based computing whilst mitigating their practical challenges. Our results showcase the potential of PNCA in advancing photonic deep learning and highlights a path for next-generation photonic computers.
Project description:Instrumental learning involves corticostriatal circuitry and the dopaminergic system. This system is typically modeled in the reinforcement learning (RL) framework by incrementally accumulating reward values of states and actions. However, human learning also implicates prefrontal cortical mechanisms involved in higher level cognitive functions. The interaction of these systems remains poorly understood, and models of human behavior often ignore working memory (WM) and therefore incorrectly assign behavioral variance to the RL system. Here we designed a task that highlights the profound entanglement of these two processes, even in simple learning problems. By systematically varying the size of the learning problem and delay between stimulus repetitions, we separately extracted WM-specific effects of load and delay on learning. We propose a new computational model that accounts for the dynamic integration of RL and WM processes observed in subjects' behavior. Incorporating capacity-limited WM into the model allowed us to capture behavioral variance that could not be captured in a pure RL framework even if we (implausibly) allowed separate RL systems for each set size. The WM component also allowed for a more reasonable estimation of a single RL process. Finally, we report effects of two genetic polymorphisms having relative specificity for prefrontal and basal ganglia functions. Whereas the COMT gene coding for catechol-O-methyl transferase selectively influenced model estimates of WM capacity, the GPR6 gene coding for G-protein-coupled receptor 6 influenced the RL learning rate. Thus, this study allowed us to specify distinct influences of the high-level and low-level cognitive functions on instrumental learning, beyond the possibilities offered by simple RL models.