Unknown

Dataset Information

0

Reward-based option competition in human dorsal stream and transition from stochastic exploration to exploitation in continuous space.


ABSTRACT: Primates exploring and exploiting a continuous sensorimotor space rely on dynamic maps in the dorsal stream. Two complementary perspectives exist on how these maps encode rewards. Reinforcement learning models integrate rewards incrementally over time, efficiently resolving the exploration/exploitation dilemma. Working memory buffer models explain rapid plasticity of parietal maps but lack a plausible exploration/exploitation policy. The reinforcement learning model presented here unifies both accounts, enabling rapid, information-compressing map updates and efficient transition from exploration to exploitation. As predicted by our model, activity in human frontoparietal dorsal stream regions, but not in MT+, tracks the number of competing options, as preferred options are selectively maintained on the map, while spatiotemporally distant alternatives are compressed out. When valuable new options are uncovered, posterior β1/α oscillations desynchronize within 0.4 to 0.7 s, consistent with option encoding by competing β1-stabilized subpopulations. Together, outcomes matching locally cached reward representations rapidly update parietal maps, biasing choices toward often-sampled, rewarded options.

SUBMITTER: Hallquist MN 

PROVIDER: S-EPMC10889364 | biostudies-literature | 2024 Feb

REPOSITORIES: biostudies-literature

altmetric image

Publications

Reward-based option competition in human dorsal stream and transition from stochastic exploration to exploitation in continuous space.

Hallquist Michael N MN   Hwang Kai K   Luna Beatriz B   Dombrovski Alexandre Y AY  

Science advances 20240223 8


Primates exploring and exploiting a continuous sensorimotor space rely on dynamic maps in the dorsal stream. Two complementary perspectives exist on how these maps encode rewards. Reinforcement learning models integrate rewards incrementally over time, efficiently resolving the exploration/exploitation dilemma. Working memory buffer models explain rapid plasticity of parietal maps but lack a plausible exploration/exploitation policy. The reinforcement learning model presented here unifies both a  ...[more]

Similar Datasets

| S-EPMC5717252 | biostudies-literature
| S-EPMC5442137 | biostudies-literature
| S-EPMC6685008 | biostudies-literature
| S-EPMC3995913 | biostudies-literature
| S-EPMC5825268 | biostudies-other
| S-EPMC3955163 | biostudies-literature
| S-EPMC11226441 | biostudies-literature
| S-EPMC4443960 | biostudies-literature
| S-EPMC6505439 | biostudies-literature
| S-EPMC10947302 | biostudies-literature