ABSTRACT: Melioration learning is an empirically well-grounded model of reinforcement learning. By means of computer simulations, this paper derives predictions for several repeatedly played two-person games from this model. The results indicate a likely convergence to a pure Nash equilibrium of the game. If no pure equilibrium exists, the relative frequencies of choice may approach the predictions of the mixed Nash equilibrium. Yet in some games, no stable state is reached.
Project description:Humans and animals face decision tasks in an uncertain multi-agent environment where an agent's strategy may change in time due to the co-adaptation of others strategies. The neuronal substrate and the computational algorithms underlying such adaptive decision making, however, is largely unknown. We propose a population coding model of spiking neurons with a policy gradient procedure that successfully acquires optimal strategies for classical game-theoretical tasks. The suggested population reinforcement learning reproduces data from human behavioral experiments for the blackjack and the inspector game. It performs optimally according to a pure (deterministic) and mixed (stochastic) Nash equilibrium, respectively. In contrast, temporal-difference(TD)-learning, covariance-learning, and basic reinforcement learning fail to perform optimally for the stochastic strategy. Spike-based population reinforcement learning, shown to follow the stochastic reward gradient, is therefore a viable candidate to explain automated decision learning of a Nash equilibrium in two-player games.
Project description:In nature and society, problems that arise when different interests are difficult to reconcile are modeled in game theory. While most applications assume that the players make decisions based only on the payoff matrix, a more detailed modeling is necessary if we also want to consider the influence of correlations on the decisions of the players. We therefore extend here the existing framework of correlated strategies by giving the players the freedom to respond to the instructions of the correlation device by probabilistically following or not following its suggestions. This creates a new type of games that we call "correlated games". The associated response strategies that can solve these games turn out to have a rich structure of Nash equilibria that goes beyond the correlated equilibrium and pure or mixed-strategy solutions and also gives better payoffs in certain cases. We here determine these Nash equilibria for all possible correlated Snowdrift games and we find these solutions to be describable by Ising models in thermal equilibrium. We believe that our approach paves the way to a study of correlations in games that uncovers the existence of interesting underlying interaction mechanisms, without compromising the independence of the players.
Project description:The relative merits of cooperation and self-interest in an ensemble of strategic interactions can be investigated by using finite random games. In finite random games, finitely many players have finite numbers of actions and independently and identically distributed (iid) random payoffs with continuous distribution functions. In each realization, players are shown the values of all payoffs and then choose their strategies simultaneously. Noncooperative self-interest is modeled by Nash equilibrium (NE). Cooperation is advantageous when a NE is Pareto-inefficient. In ordinal games, the numerical value of the payoff function gives each player's ordinal ranking of payoffs. For a fixed number of players, as the number of actions of any player increases, the conditional probability that a pure strategic profile is not pure Pareto-optimal, given that it is a pure NE, apparently increases, but is bounded above strictly below 1. In games with transferable utility, the numerical payoff values may be averaged across actions (so that mixed NEs are meaningful) and added across players. In simulations of two-player games when both players have small, equal numbers of actions, as the number of actions increases, the probability that a NE (pure and mixed) attains the cooperative maximum declines rapidly; the gain from cooperation relative to the Nash high value decreases; and the gain from cooperation relative to the Nash low value rises dramatically. In the cases studied here, with an increasing number of actions, cooperation is increasingly likely to become advantageous compared with pure self-interest, but self-interest can achieve all that cooperation could achieve in a nonnegligible fraction of cases. These results can be interpreted in terms of cooperation in societies and mutualism in biology.
Project description:Game theory is widely used to model interacting biological and social systems. In some situations, players may converge to an equilibrium, e.g., a Nash equilibrium, but in other situations their strategic dynamics oscillate endogenously. If the system is not designed to encourage convergence, which of these two behaviors can we expect a priori? To address this question, we follow an approach that is popular in theoretical ecology to study the stability of ecosystems: We generate payoff matrices at random, subject to constraints that may represent properties of real-world games. We show that best reply cycles, basic topological structures in games, predict nonconvergence of six well-known learning algorithms that are used in biology or have support from experiments with human players. Best reply cycles are dominant in complicated and competitive games, indicating that in this case equilibrium is typically an unrealistic assumption, and one must explicitly model the dynamics of learning.
Project description:Information transfer is a basic feature of life that includes signaling within and between organisms. Owing to its interactive nature, signaling can be investigated by using game theory. Game theoretic models of signaling have a long tradition in biology, economics, and philosophy. For a long time the analyses of these games has mostly relied on using static equilibrium concepts such as Pareto optimal Nash equilibria or evolutionarily stable strategies. More recently signaling games of various types have been investigated with the help of game dynamics, which includes dynamical models of evolution and individual learning. A dynamical analysis leads to more nuanced conclusions as to the outcomes of signaling interactions. Here we explore different kinds of signaling games that range from interactions without conflicts of interest between the players to interactions where their interests are seriously misaligned. We consider these games within the context of evolutionary dynamics (both infinite and finite population models) and learning dynamics (reinforcement learning). Some results are specific features of a particular dynamical model, whereas others turn out to be quite robust across different models. This suggests that there are certain qualitative aspects that are common to many real-world signaling interactions.
Project description:This article studies correlated two-person games constructed from games with independent players as proposed in Iqbal et al. (2016 R. Soc. open sci.3, 150477. (doi:10.1098/rsos.150477)). The games are played in a collective manner, both in a two-dimensional lattice where the players interact with their neighbours, and with players interacting at random. Four game types are scrutinized in iterated games where the players are allowed to change their strategies, adopting that of their best paid mate neighbour. Particular attention is paid in the study to the effect of a variable degree of correlation on Nash equilibrium strategy pairs.
Project description:Economic Experimental Games have shown that individuals make decisions that deviate down from the suboptimal Nash equilibrium. However, few studies have analyzed the case when deviation is above the Nash equilibrium. Extracting from above the Nash equilibrium is inefficient not only socially but also privately and it would exacerbate the tragedy of the commons. That would be the case of a race to the fish when stocks are becoming depleted or driver behavior on a highly congested road. The objective of this study is to analyze private inefficient extraction behavior in experimental games and to associate the type of player and the type of player group with such inefficient outcomes. To do this, we carried out economic experimental games with local coastal fishermen in Colombia, using a setting where the scarcity of the resource allows for an interior Nash equilibrium and inefficient over-extraction is possible. The state of the resource, the type of player and the composition of the group explain, in part, this inefficient behavior.
Project description:We introduce ?-Rank, a principled evolutionary dynamics methodology, for the evaluation and ranking of agents in large-scale multi-agent interactions, grounded in a novel dynamical game-theoretic solution concept called Markov-Conley chains (MCCs). The approach leverages continuous-time and discrete-time evolutionary dynamical systems applied to empirical games, and scales tractably in the number of agents, in the type of interactions (beyond dyadic), and the type of empirical games (symmetric and asymmetric). Current models are fundamentally limited in one or more of these dimensions, and are not guaranteed to converge to the desired game-theoretic solution concept (typically the Nash equilibrium). ?-Rank automatically provides a ranking over the set of agents under evaluation and provides insights into their strengths, weaknesses, and long-term dynamics in terms of basins of attraction and sink components. This is a direct consequence of the correspondence we establish to the dynamical MCC solution concept when the underlying evolutionary model's ranking-intensity parameter, ?, is chosen to be large, which exactly forms the basis of ?-Rank. In contrast to the Nash equilibrium, which is a static solution concept based solely on fixed points, MCCs are a dynamical solution concept based on the Markov chain formalism, Conley's Fundamental Theorem of Dynamical Systems, and the core ingredients of dynamical systems: fixed points, recurrent sets, periodic orbits, and limit cycles. Our ?-Rank method runs in polynomial time with respect to the total number of pure strategy profiles, whereas computing a Nash equilibrium for a general-sum game is known to be intractable. We introduce mathematical proofs that not only provide an overarching and unifying perspective of existing continuous- and discrete-time evolutionary evaluation models, but also reveal the formal underpinnings of the ?-Rank methodology. We illustrate the method in canonical games and empirically validate it in several domains, including AlphaGo, AlphaZero, MuJoCo Soccer, and Poker.
Project description:We introduce Q-Nash, a quantum annealing algorithm for the NP-complete problem of finding pure Nash equilibria in graphical games. The algorithm consists of two phases. The first phase determines all combinations of best response strategies for each player using classical computation. The second phase finds pure Nash equilibria using a quantum annealing device by mapping the computed combinations to a quadratic unconstrained binary optimization formulation based on the Set Cover problem. We empirically evaluate Q-Nash on D-Wave’s Quantum Annealer 2000Q using different graphical game topologies. The results with respect to solution quality and computing time are compared to a Brute Force algorithm and the Iterated Best Response heuristic.
Project description:We study adaptive learning in a typical p-player game. The payoffs of the games are randomly generated and then held fixed. The strategies of the players evolve through time as the players learn. The trajectories in the strategy space display a range of qualitatively different behaviours, with attractors that include unique fixed points, multiple fixed points, limit cycles and chaos. In the limit where the game is complicated, in the sense that the players can take many possible actions, we use a generating-functional approach to establish the parameter range in which learning dynamics converge to a stable fixed point. The size of this region goes to zero as the number of players goes to infinity, suggesting that complex non-equilibrium behaviour, exemplified by chaos, is the norm for complicated games with many players.