Thompson sampling beyond classical bandits

Kalkanli, Cem

Thompson sampling beyond classical bandits

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fyt956wv3845" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Thompson sampling has been shown to be an effective policy for a variety of sequential decision making problems. Motivated by its state-of-the-art empirical performance and straightforward implementation, many recent works have focused on analyzing its theoretical performance. Despite this interest, however, many questions remain unanswered. Can Thompson sampling identify the best action among many others in a sequential decision making problem? What is the best performance guarantee we can provide for Thompson sampling in a Gaussian linear model? Can Thompson sampling still perform well even when it receives delayed feedback in the form of batches? These questions lie at the heart of many real life applications, and by answering them this thesis contributes towards a better understanding of the performance of Thompson sampling in more complex and realistic scenarios. We first study the exploration capabilities of Thompson sampling. While it is well known that Thompson sampling is an optimal algorithm which achieves sub-linear cumulative regret in the classical multi-armed bandits problem, whether or not Thompson sampling can identify the optimal action remains unknown. This is because a regret-optimal algorithm can potentially select a suboptimal arm infinitely often, hence failing to identify the optimal action. In this thesis, we show that Thompson sampling gradually determines the optimal arm with probability one whenever it achieves sub-linear regret, which is known to be the case in many classical bandits problems. Using this result, we introduce the first strongly consistent estimator for identifying the optimal action that uses only the actions selected by the Thompson sampling agent. Later we study the performance of Thompson sampling for Gaussian linear bandits. We improve the state-of-the-art regret bounds on the expected performance of Thompson sampling by an order of sqrt(log(T)) where T stands for the experiment duration. We achieve this result by introducing a novel Cauchy-Schwarz type inequality for random vectors. Finally we study the performance of Thompson sampling for the batched multi-armed bandits problem. Prior work has devised algorithms specialized for this batched setting that optimize the batch structure for a given time horizon T and prioritize exploration in the beginning of the experiment to eliminate suboptimal actions. It is not clear whether or not Thompson sampling, an algorithm that implicitly balances exploration and exploitation without knowing the time horizon, can perform well under batched feedback. In this thesis, we answer this question positively. We provide the first adversarial batching result in the literature by showing that Thompson sampling maintains its optimal performance even when the batch structure is chosen adversarially as long as it receives enough feedback. Additionally, we introduce two adaptive batching strategies tuned to a given target performance criteria: asymptotic or finite time performance. Both algorithms require only O(loglog(T)) number of batches to achieve optimal performance for a given problem instance, resulting in an exponentially smaller number of batches than previous algorithms. This opens the way to drastically parallelize the operation of Thompson sampling and reduce the number of times the agent needs to interact with the underlying system in many real-world applications.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2022; ©2022
Publication date	2022; 2022
Issuance	monographic
Language	English

Creators/Contributors

Author	Kalkanli, Cem
Degree supervisor	Özgür, Ayfer
Thesis advisor	Özgür, Ayfer
Thesis advisor	Pilanci, Mert
Thesis advisor	Weissman, Tsachy
Degree committee member	Pilanci, Mert
Degree committee member	Weissman, Tsachy
Associated with	Stanford University, Department of Electrical Engineering

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Cem Kalkanli.
Note	Submitted to the Department of Electrical Engineering.
Thesis	Thesis Ph.D. Stanford University 2022.
Location	https://purl.stanford.edu/yt956wv3845

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...