Data-driven decisions in stochastic systems : reliability and efficiency

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fxk241jp4730" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: In Chapters 2-4, we develop approximations for the distribution of regret of multi-armed bandit algorithms. These approximations yield new insights about the exploration-exploitation trade-off in bandit environments. Much of the literature on optimal design of bandit algorithms is based on minimization of expected regret. It is well known that algorithms that are optimal over certain exponential families can achieve expected regret that grows at $\log(T)$ rates with the time horizon $T$, as specified by the Lai-Robbins lower bound. In Chapter 2, we show that when one uses such optimized algorithms, the resulting regret distribution necessarily has a very heavy tail, specifically, that of a truncated Cauchy distribution. Furthermore, for

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2023; ©2023
Publication date	2023; 2023
Issuance	monographic
Language	English

Author	Fan, Lin
Degree supervisor	Glynn, Peter
Thesis advisor	Glynn, Peter
Thesis advisor	Blanchet Mancilla, Jose
Thesis advisor	Pelger, Markus
Degree committee member	Blanchet Mancilla, Jose
Degree committee member	Pelger, Markus
Associated with	Stanford University, School of Engineering
Associated with	Stanford University, Department of Management Science and Engineering

Genre	Theses
Genre	Text

Statement of responsibility	Lin Fan.
Note	Submitted to the Department of Management Science and Engineering.
Thesis	Thesis Ph.D. Stanford University 2023.
Location	https://purl.stanford.edu/xk241jp4730

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

View in SearchWorks

Loading usage metrics...