Ontology highlight
ABSTRACT:
SUBMITTER: Chen H
PROVIDER: S-EPMC7962379 | biostudies-literature | 2021
REPOSITORIES: biostudies-literature
Chen Haoyu H Lu Wenbin W Song Rui R
Journal of the American Statistical Association 20200707 533
Online decision-making problem requires us to make a sequence of decisions based on incremental information. Common solutions often need to learn a reward model of different actions given the contextual information and then maximize the long-term reward. It is meaningful to know if the posited model is reasonable and how the model performs in the asymptotic sense. We study this problem under the setup of the contextual bandit framework with a linear reward model. The <i>ε</i>-greedy policy is ad ...[more]