Policy evaluation and learning in adaptive experiments

Placeholder Show Content

Abstract/Contents

Abstract
Adaptive experiments are becoming increasingly prevalent due to the ability to largely improve sample efficiency to fulfill particular objectives. This results in the growing availability of data collected from these designs such as contextual bandits. A natural query arises: can we reuse the data to answer a variety of questions that may not be originally targeted by the experiments? However, adaptivity also poses great statistical challenges if the post-analysis objective differs significantly from the original, and standard approaches used to analyze independently collected data can be plagued by bias, excessive variance, or both. This thesis aims to serve as a research investigation of one step towards such post hoc analyses around two themes: evaluating other treatment assignment policies to guide future innovation or experiments, and learning optimal policies to facilitate personalization. Our main contributions are as follows: (i) We present a family of generalized augmented inverse propensity weighted (AIPW) estimators to evaluate a given policy with adaptively collected data from multi-armed bandits. Our approach is to adaptively reweight the terms of an AIPW estimator to control the contribution of each term to the estimator's variance. This scheme reduces overall estimation variance and yields an asymptotically normal test statistic. (ii) We extend the adaptive weighting approach to evaluate policies in contextual bandits, where the weights are carefully chosen to accommodate the variances of AIPW terms that may differ not only over time, but also across the context space. The resulting estimator further reduces estimation variance. (iii) Based on a special variant of above estimators, we propose an algorithm to learn optimal policies with contextual bandit data and establish its finite-sample regret bound. We complement this regret upper bound with a lower bound that characterizes the fundamental difficulty of policy learning with adaptive data. Collectively, we hope our results can shed light on the design and implementation of hypothesis testing and efficient policy learning using adaptively collected data.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2021; ©2021
Publication date 2021; 2021
Issuance monographic
Language English

Creators/Contributors

Author Zhan, Ruohan
Degree supervisor Athey, Susan
Thesis advisor Athey, Susan
Thesis advisor Van Roy, Benjamin
Thesis advisor Wager, Stefan
Degree committee member Van Roy, Benjamin
Degree committee member Wager, Stefan
Associated with Stanford University, Institute for Computational and Mathematical Engineering

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Ruohan Zhan.
Note Submitted to the Institute for Computational and Mathematical Engineering.
Thesis Thesis Ph.D. Stanford University 2021.
Location https://purl.stanford.edu/wm876jf4432

Access conditions

Copyright
© 2021 by Ruohan Zhan
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...