Ontology highlight
ABSTRACT:
SUBMITTER: Liao P
PROVIDER: S-EPMC10072865 | biostudies-literature | 2022 Dec
REPOSITORIES: biostudies-literature
Liao Peng P Qi Zhengling Z Wan Runzhe R Klasnja Predrag P Murphy Susan A SA
Annals of statistics 20221221 6
We consider the batch (off-line) policy learning problem in the infinite horizon Markov Decision Process. Motivated by mobile health applications, we focus on learning a policy that maximizes the long-term average reward. We propose a doubly robust estimator for the average reward and show that it achieves semiparametric efficiency. Further we develop an optimization algorithm to compute the optimal policy in a parameterized stochastic policy class. The performance of the estimated policy is mea ...[more]