Efficient learning in sequential optimization

Russo, Daniel; Stanford University, Department of Management Science and Engineering.

Efficient learning in sequential optimization

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fht930kk1863" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: We consider a broad class of online optimization problems in which a decision-maker must balance between exploration and exploitation while learning from partial feedback. In these problems, the decision-maker repeatedly chooses among a set of possible actions, observes an outcome, and receives a reward representing the utility derived from this outcome. She is uncertain about the underlying system and is therefore initially unsure of which action is best. However, as outcomes are observed, she is able to learn over time to make increasingly effective decisions. Her objective is to choose actions sequentially so as to maximize the expected cumulative reward. We focus on three algorithmic approaches that accommodate flexible statistical modeling, and are capable of experimenting efficiently in broad classes of problems. The first part of the thesis focuses on a design principle known as optimism in the face of uncertainty, which underlies many of the most effective exploration algorithms. We provide a regret bound for an optimistic algorithm that applies broadly and can be specialized to many specific model classes. Our bound depends on a new notion of dimension that measures the degree of dependence among actions. We compare our notion to the Vapnik-Chervonenkis dimension, and explain why that and other measures of dimension used in the supervised literature do not suffice when it comes to analyzing optimistic algorithms. We then turn our attention to Thompson sampling, an elegant algorithm for learning in online optimization problems with partial feedback. We derive a close theoretical connection between Thompson sampling and optimistic algorithms. Due to the connection we derive, existing analysis available for specific optimistic algorithms immediately translates to expected regret bounds for Thompson sampling. The second part of the thesis pushes beyond the optimistic principle, and offers a fresh, information-theoretic, perspective on the exploration/exploitation tradeoff. We first revisit Thompson sampling from this perspective and provide novel regret bounds that scale with the entropy of the optimal action distribution. Then, we propose a new algorithm--information-directed sampling (IDS)--and study its performance. IDS quantifies the amount learned by selecting an action through an information theoretic measure, and selects actions by optimizing an objective that explicitly balances attaining a high immediate reward and selecting informative actions. We provide a general regret bound for IDS, demonstrate strong performance in simulation, and show through simple analytic examples that it can dramatically outperform Thompson sampling due to the way it quantifies information.

Description

Type of resource	text
Form	electronic; electronic resource; remote
Extent	1 online resource.
Publication date	2015
Issuance	monographic
Language	English

Creators/Contributors

Associated with	Russo, Daniel
Associated with	Stanford University, Department of Management Science and Engineering.
Primary advisor	Van Roy, Benjamin
Thesis advisor	Van Roy, Benjamin
Thesis advisor	Johari, Ramesh, 1976-
Thesis advisor	Tse, David
Advisor	Johari, Ramesh, 1976-
Advisor	Tse, David

Subjects

Genre	Theses

Bibliographic information

Statement of responsibility	Daniel Russo.
Note	Submitted to the Department of Management Science and Engineering.
Thesis	Thesis (Ph.D.)--Stanford University, 2015.
Location	electronic resource

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...