Data-driven decisions in stochastic systems : reliability and efficiency

Placeholder Show Content

Abstract/Contents

Abstract
In Chapters 2-4, we develop approximations for the distribution of regret of multi-armed bandit algorithms. These approximations yield new insights about the exploration-exploitation trade-off in bandit environments. Much of the literature on optimal design of bandit algorithms is based on minimization of expected regret. It is well known that algorithms that are optimal over certain exponential families can achieve expected regret that grows at $\log(T)$ rates with the time horizon $T$, as specified by the Lai-Robbins lower bound. In Chapter 2, we show that when one uses such optimized algorithms, the resulting regret distribution necessarily has a very heavy tail, specifically, that of a truncated Cauchy distribution. Furthermore, for

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2023; ©2023
Publication date 2023; 2023
Issuance monographic
Language English

Creators/Contributors

Author Fan, Lin
Degree supervisor Glynn, Peter
Thesis advisor Glynn, Peter
Thesis advisor Blanchet Mancilla, Jose
Thesis advisor Pelger, Markus
Degree committee member Blanchet Mancilla, Jose
Degree committee member Pelger, Markus
Associated with Stanford University, School of Engineering
Associated with Stanford University, Department of Management Science and Engineering

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Lin Fan.
Note Submitted to the Department of Management Science and Engineering.
Thesis Thesis Ph.D. Stanford University 2023.
Location https://purl.stanford.edu/xk241jp4730

Access conditions

Copyright
© 2023 by Lin Fan
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...