Dataset Information

Statistical Inference for Online Decision-Making: In a Contextual Bandit Setting.

ABSTRACT: Online decision-making problem requires us to make a sequence of decisions based on incremental information. Common solutions often need to learn a reward model of different actions given the contextual information and then maximize the long-term reward. It is meaningful to know if the posited model is reasonable and how the model performs in the asymptotic sense. We study this problem under the setup of the contextual bandit framework with a linear reward model. The ε-greedy policy is adopted to address the classic exploration-and-exploitation dilemma. Using the martingale central limit theorem, we show that the online ordinary least squares estimator of model parameters is asymptotically normal. When the linear model is misspecified, we propose the online weighted least squares estimator using the inverse propensity score weighting and also establish its asymptotic normality. Based on the properties of the parameter estimators, we further show that the in-sample inverse propensity weighted value estimator is asymptotically normal. We illustrate our results using simulations and an application to a news article recommendation dataset from Yahoo!.

SUBMITTER: Chen H

PROVIDER: S-EPMC7962379 | biostudies-literature | 2021

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Statistical Inference for Online Decision-Making: In a Contextual Bandit Setting.

Chen Haoyu H Lu Wenbin W Song Rui R

Journal of the American Statistical Association 20200707 533

Online decision-making problem requires us to make a sequence of decisions based on incremental information. Common solutions often need to learn a reward model of different actions given the contextual information and then maximize the long-term reward. It is meaningful to know if the posited model is reasonable and how the model performs in the asymptotic sense. We study this problem under the setup of the contextual bandit framework with a linear reward model. The <i>ε</i>-greedy policy is ad ...[more]

PMID: 33737759

Dataset Information

Statistical Inference for Online Decision-Making: In a Contextual Bandit Setting.

Publications

Statistical Inference for Online Decision-Making: In a Contextual Bandit Setting.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Post-Contextual-Bandit Inference.
| S-EPMC9249103 | biostudies-literature

Online Updating of Statistical Inference in the Big Data Setting.
| S-EPMC5179229 | biostudies-literature

The Role of Expert Judgment in Statistical Inference and Evidence-Based Decision-Making.
| S-EPMC6474725 | biostudies-literature

Decision-making without a brain: how an amoeboid organism solves the two-armed bandit.
| S-EPMC4938078 | biostudies-literature

Continuous evolution of statistical estimators for optimal decision-making.
| S-EPMC3382620 | biostudies-literature

A Biased Bayesian Inference for Decision-Making and Cognitive Control.
| S-EPMC6195105 | biostudies-literature

Statistical Inference for High-Dimensional Models via Recursive Online-Score Estimation.
| S-EPMC8439566 | biostudies-literature

A statistical theory of optimal decision-making in sports betting.
| S-EPMC10306238 | biostudies-literature

Flexible and efficient simulation-based inference for models of decision-making.
| S-EPMC9374439 | biostudies-literature

Shared Decision Making in Cardiovascular Disease in the Outpatient Setting.
| S-EPMC8301252 | biostudies-literature