Project description:Sepsis is a potentially life-threatening inflammatory response to infection or severe tissue damage. It has a highly variable clinical course, requiring constant monitoring of the patient's state to guide the management of intravenous fluids and vasopressors, among other interventions. Despite decades of research, there's still debate among experts on optimal treatment. Here, we combine for the first time, distributional deep reinforcement learning with mechanistic physiological models to find personalized sepsis treatment strategies. Our method handles partial observability by leveraging known cardiovascular physiology, introducing a novel physiology-driven recurrent autoencoder, and quantifies the uncertainty of its own results. Moreover, we introduce a framework for uncertainty-aware decision support with humans in the loop. We show that our method learns physiologically explainable, robust policies, that are consistent with clinical knowledge. Further our method consistently identifies high-risk states that lead to death, which could potentially benefit from more frequent vasopressor administration, providing valuable guidance for future research.
Project description:What determines the speed of our decisions? Various models of decision-making have focused on perceptual evidence, past experience, and task complexity as important factors determining the degree of deliberation needed for a decision. Here, we build on a sequential sampling decision-making framework to develop a new model that captures a range of reaction time (RT) effects by accounting for both working memory and instrumental learning processes. The model captures choices and RTs at various stages of learning, and in learning environments with varying complexity. Moreover, the model generalizes from tasks with deterministic reward contingencies to probabilistic ones. The model succeeds in part by incorporating prior uncertainty over actions when modeling RT. This straightforward process model provides a parsimonious account of decision dynamics during instrumental learning and makes unique predictions about internal representations of action values.
Project description:In this paper, we introduce a new type of tree-based method, reinforcement learning trees (RLT), which exhibits significantly improved performance over traditional methods such as random forests (Breiman, 2001) under high-dimensional settings. The innovations are three-fold. First, the new method implements reinforcement learning at each selection of a splitting variable during the tree construction processes. By splitting on the variable that brings the greatest future improvement in later splits, rather than choosing the one with largest marginal effect from the immediate split, the constructed tree utilizes the available samples in a more efficient way. Moreover, such an approach enables linear combination cuts at little extra computational cost. Second, we propose a variable muting procedure that progressively eliminates noise variables during the construction of each individual tree. The muting procedure also takes advantage of reinforcement learning and prevents noise variables from being considered in the search for splitting rules, so that towards terminal nodes, where the sample size is small, the splitting rules are still constructed from only strong variables. Last, we investigate asymptotic properties of the proposed method under basic assumptions and discuss rationale in general settings.
Project description:BackgroundNegative symptoms are core features of schizophrenia (SZ); however, the cognitive and neural basis for individual negative symptom domains remains unclear. Converging evidence suggests a role for striatal and prefrontal dopamine in reward learning and the exploration of actions that might produce outcomes that are better than the status quo. The current study examines whether deficits in reinforcement learning and uncertainty-driven exploration predict specific negative symptom domains.MethodsWe administered a temporal decision-making task, which required trial-by-trial adjustment of reaction time to maximize reward receipt, to 51 patients with SZ and 39 age-matched healthy control subjects. Task conditions were designed such that expected value (probability × magnitude) increased, decreased, or remained constant with increasing response times. Computational analyses were applied to estimate the degree to which trial-by-trial responses are influenced by reinforcement history.ResultsIndividuals with SZ showed impaired Go learning but intact NoGo learning relative to control subjects. These effects were most pronounced in patients with higher levels of negative symptoms. Uncertainty-based exploration was substantially reduced in individuals with SZ and selectively correlated with clinical ratings of anhedonia.ConclusionsSchizophrenia patients, particularly those with high negative symptoms, failed to speed reaction times to increase positive outcomes and showed reduced tendency to explore when alternative actions could lead to better outcomes than the status quo. Results are interpreted in the context of current computational, genetic, and pharmacological data supporting the roles of striatal and prefrontal dopamine in these processes.
Project description:Reinforcement learning in complex natural environments is a challenging task because the agent should generalize from the outcomes of actions taken in one state of the world to future actions in different states of the world. The extent to which human experts find the proper level of generalization is unclear. Here we show, using the sequences of field goal attempts made by professional basketball players, that the outcome of even a single field goal attempt has a considerable effect on the rate of subsequent 3 point shot attempts, in line with standard models of reinforcement learning. However, this change in behaviour is associated with negative correlations between the outcomes of successive field goal attempts. These results indicate that despite years of experience and high motivation, professional players overgeneralize from the outcomes of their most recent actions, which leads to decreased performance.
Project description:Reinforcement learning is a computational framework for an active agent to learn behaviors on the basis of a scalar reward signal. The agent can be an animal, a human, or an artificial system such as a robot or a computer program. The reward can be food, water, money, or whatever measure of the performance of the agent. The theory of reinforcement learning, which was developed in an artificial intelligence community with intuitions from animal learning theory, is now giving a coherent account on the function of the basal ganglia. It now serves as the "common language" in which biologists, engineers, and social scientists can exchange their problems and findings. This article reviews the basic theoretical framework of reinforcement learning and discusses its recent and future contributions toward the understanding of animal behaviors and human decision making.
Project description:Rectilinear crawling locomotion is a primitive and common mode of locomotion in slender soft-bodied animals. It requires coordinated contractions that propagate along a body that interacts frictionally with its environment. We propose a simple approach to understand how this coordination arises in a neuromechanical model of a segmented, soft-bodied crawler via an iterative process that might have both biological antecedents and technological relevance. Using a simple reinforcement learning algorithm, we show that an initial all-to-all neural coupling converges to a simple nearest-neighbour neural wiring that allows the crawler to move forward using a localized wave of contraction that is qualitatively similar to what is observed in Drosophila melanogaster larvae and used in many biomimetic solutions. The resulting solution is a function of how we weight gait regularization in the reward, with a trade-off between speed and robustness to proprioceptive noise. Overall, our results, which embed the brain-body-environment triad in a learning scheme, have relevance for soft robotics while shedding light on the evolution and development of locomotion.
Project description:In the Bayesian Reinforcement Learning (BRL) setting, agents try to maximise the collected rewards while interacting with their environment while using some prior knowledge that is accessed beforehand. Many BRL algorithms have already been proposed, but the benchmarks used to compare them are only relevant for specific cases. The paper addresses this problem, and provides a new BRL comparison methodology along with the corresponding open source library. In this methodology, a comparison criterion that measures the performance of algorithms on large sets of Markov Decision Processes (MDPs) drawn from some probability distributions is defined. In order to enable the comparison of non-anytime algorithms, our methodology also includes a detailed analysis of the computation time requirement of each algorithm. Our library is released with all source code and documentation: it includes three test problems, each of which has two different prior distributions, and seven state-of-the-art RL algorithms. Finally, our library is illustrated by comparing all the available algorithms and the results are discussed.
Project description:E-learning systems are capable of providing more adaptive and efficient learning experiences for learners than traditional classroom settings. A key component of such systems is the learning policy. The learning policy is an algorithm that designs the learning paths or rather it selects learning materials for learners based on information such as the learners' current progresses and skills, learning material contents. In this article, the authors address the problem of finding the optimal learning policy. To this end, a model for learners' hierarchical skills in the E-learning system is first developed. Based on the hierarchical skill model and the classical cognitive diagnosis model, a framework to model various mastery levels related to hierarchical skills is further developed. The optimal learning path in consideration of the hierarchical structure of skills is found by applying a model-free reinforcement learning method, which does not require any assumption about learners' learning transition processes. The effectiveness of the proposed framework is demonstrated via simulation studies.
Project description:Reinforcement learning involves decision making in dynamic and uncertain environments and constitutes an important element of artificial intelligence (AI). In this work, we experimentally demonstrate that the ultrafast chaotic oscillatory dynamics of lasers efficiently solve the multi-armed bandit problem (MAB), which requires decision making concerning a class of difficult trade-offs called the exploration-exploitation dilemma. To solve the MAB, a certain degree of randomness is required for exploration purposes. However, pseudorandom numbers generated using conventional electronic circuitry encounter severe limitations in terms of their data rate and the quality of randomness due to their algorithmic foundations. We generate laser chaos signals using a semiconductor laser sampled at a maximum rate of 100 GSample/s, and combine it with a simple decision-making principle called tug of war with a variable threshold, to ensure ultrafast, adaptive, and accurate decision making at a maximum adaptation speed of 1 GHz. We found that decision-making performance was maximized with an optimal sampling interval, and we highlight the exact coincidence between the negative autocorrelation inherent in laser chaos and decision-making performance. This study paves the way for a new realm of ultrafast photonics in the age of AI, where the ultrahigh bandwidth of light wave can provide new value.