Minimax regret bounds for stochastic linear bandit algorithms
Abstract/Contents
- Abstract
- The multi-armed bandit problem is a popular framework to model online experiments and the ideas in this domain are intended to help making better decisions as more data becomes available to a learner. Stochastic linear bandits are an important subclass of this general problem that have many applications in areas such as dynamic asset pricing and healthcare. In this dissertation, we study this problem and provide an analysis that produces many existing results as well as some new bounds. Our analysis also naturally leads to a new algorithm, called Sieved-Greedy (SG), that reduces the exploration while maintaining strong theoretical guarantees. Furthermore, we show that all of these bounds can be improved significantly under additional gap assumptions. In particular, we show that the regret of Optimism in Face of Uncertainty Linear bandit (OFUL) and Thompson Sampling (TS), two popular algorithms in this domain, scale poly-logarithmically under these assumptions (rather than as $\sqrt T$ ). Next, we focus on TS and prove that, although it achieves near optimal Bayesian regret bounds, it may perform poorly under mild distributional mismatches. Finally, we introduce a data-driven version of TS and prove that, under mild conditions, it achieves minimax optimal regret even in the worst case. An important property of both of our proposed algorithms (SG and improved TS) is that they use the observed data to adjust the rate of exploration.
Description
Type of resource | text |
---|---|
Form | electronic resource; remote; computer; online resource |
Extent | 1 online resource. |
Place | California |
Place | [Stanford, California] |
Publisher | [Stanford University] |
Copyright date | 2021; ©2021 |
Publication date | 2021; 2021 |
Issuance | monographic |
Language | English |
Creators/Contributors
Author | Hamidi, Nima |
---|---|
Degree supervisor | Bayati, Mohsen |
Thesis advisor | Bayati, Mohsen |
Thesis advisor | Candès, Emmanuel J. (Emmanuel Jean) |
Thesis advisor | Lai, T. L |
Degree committee member | Candès, Emmanuel J. (Emmanuel Jean) |
Degree committee member | Lai, T. L |
Associated with | Stanford University, Department of Statistics |
Subjects
Genre | Theses |
---|---|
Genre | Text |
Bibliographic information
Statement of responsibility | Nima Hamidi. |
---|---|
Note | Submitted to the Department of Statistics. |
Thesis | Thesis Ph.D. Stanford University 2021. |
Location | https://purl.stanford.edu/nj675kr8706 |
Access conditions
- Copyright
- © 2021 by Nima Hamidi
- License
- This work is licensed under a Creative Commons Attribution 3.0 Unported license (CC BY).
Also listed in
Loading usage metrics...