Reinforcement learning : when can we do sample efficient exploration?

Zanette, Andrea

Reinforcement learning : when can we do sample efficient exploration?

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fns268vs3612" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: The performance of Reinforcement Learning (RL) agents varies wildly with the underlying problem: sometimes they learn quickly, some other time slowly, and sometime not at all. When can an RL agent quickly learn good policies? To provide a theoretically justified answer, this dissertation investigates four fundamental RL questions: 1) RL agents often learn much faster than what theory predicts. How common are easy instances and how can an algorithm systematically exploit them? We prove that several RL problems of practical interest are learnable much faster than what worst-case analyses suggest. This is achieved by an agnostic, instance-adaptive algorithm that can leverage benign instances to accelerate learning while preserving state-of-the-art worst-case minimax guarantees. 2) Randomized agents often perform better empirically than their deterministic counterpart. Is their performance grounded in solid theoretical roots? We provide an affirmative answer in the context of function approximation, and give the first regret bounds for randomized exploration with function approximation, placing randomized agents on solid theoretical footing. 3) Providing formal online learning guarantees with function approximation requires more stringent assumptions than what is necessary in batch RL; can these additional requirements be relaxed or is such gap fundamental? We give an encouraging answer by designing and analyzing a minimax optimal online algorithm that works under the Bellman closure of the function class, as is commonly needed in offline batch analyses of RL methods with function approximation. As a special case, this gives the first misspecification-robust algorithm and analysis for misspecified linear contextual bandits. 4) What are the fundamental limits of batch reinforcement learning? Is there a fundamental sample complexity separation between online and batch RL? The dissertation establishes the first exponential lower bounds for batch RL even for simple linear predictors, and provide the first, formal exponential separation that shows that batch RL can be exponentially harder than online RL.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2021; ©2021
Publication date	2021; 2021
Issuance	monographic
Language	English

Creators/Contributors

Author	Zanette, Andrea
Degree supervisor	Kochenderfer, Mykel J, 1980-
Thesis advisor	Kochenderfer, Mykel J, 1980-
Thesis advisor	Brunskill, Emma
Thesis advisor	Lazaric, Alessandro
Thesis advisor	Sidford, Aaron
Degree committee member	Brunskill, Emma
Degree committee member	Lazaric, Alessandro
Degree committee member	Sidford, Aaron
Associated with	Stanford University, Institute for Computational and Mathematical Engineering

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Andrea Zanette.
Note	Submitted to the Institute for Computational and Mathematical Engineering.
Thesis	Thesis Ph.D. Stanford University 2021.
Location	https://purl.stanford.edu/ns268vs3612

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...