Policy evaluation and learning in adaptive experiments
Abstract/Contents
- Abstract
- Adaptive experiments are becoming increasingly prevalent due to the ability to largely improve sample efficiency to fulfill particular objectives. This results in the growing availability of data collected from these designs such as contextual bandits. A natural query arises: can we reuse the data to answer a variety of questions that may not be originally targeted by the experiments? However, adaptivity also poses great statistical challenges if the post-analysis objective differs significantly from the original, and standard approaches used to analyze independently collected data can be plagued by bias, excessive variance, or both. This thesis aims to serve as a research investigation of one step towards such post hoc analyses around two themes: evaluating other treatment assignment policies to guide future innovation or experiments, and learning optimal policies to facilitate personalization. Our main contributions are as follows: (i) We present a family of generalized augmented inverse propensity weighted (AIPW) estimators to evaluate a given policy with adaptively collected data from multi-armed bandits. Our approach is to adaptively reweight the terms of an AIPW estimator to control the contribution of each term to the estimator's variance. This scheme reduces overall estimation variance and yields an asymptotically normal test statistic. (ii) We extend the adaptive weighting approach to evaluate policies in contextual bandits, where the weights are carefully chosen to accommodate the variances of AIPW terms that may differ not only over time, but also across the context space. The resulting estimator further reduces estimation variance. (iii) Based on a special variant of above estimators, we propose an algorithm to learn optimal policies with contextual bandit data and establish its finite-sample regret bound. We complement this regret upper bound with a lower bound that characterizes the fundamental difficulty of policy learning with adaptive data. Collectively, we hope our results can shed light on the design and implementation of hypothesis testing and efficient policy learning using adaptively collected data.
Description
Type of resource | text |
---|---|
Form | electronic resource; remote; computer; online resource |
Extent | 1 online resource. |
Place | California |
Place | [Stanford, California] |
Publisher | [Stanford University] |
Copyright date | 2021; ©2021 |
Publication date | 2021; 2021 |
Issuance | monographic |
Language | English |
Creators/Contributors
Author | Zhan, Ruohan |
---|---|
Degree supervisor | Athey, Susan |
Thesis advisor | Athey, Susan |
Thesis advisor | Van Roy, Benjamin |
Thesis advisor | Wager, Stefan |
Degree committee member | Van Roy, Benjamin |
Degree committee member | Wager, Stefan |
Associated with | Stanford University, Institute for Computational and Mathematical Engineering |
Subjects
Genre | Theses |
---|---|
Genre | Text |
Bibliographic information
Statement of responsibility | Ruohan Zhan. |
---|---|
Note | Submitted to the Institute for Computational and Mathematical Engineering. |
Thesis | Thesis Ph.D. Stanford University 2021. |
Location | https://purl.stanford.edu/wm876jf4432 |
Access conditions
- Copyright
- © 2021 by Ruohan Zhan
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...