Reinforcement learning : when can we do sample efficient exploration?

Placeholder Show Content

Abstract/Contents

Abstract
The performance of Reinforcement Learning (RL) agents varies wildly with the underlying problem: sometimes they learn quickly, some other time slowly, and sometime not at all. When can an RL agent quickly learn good policies? To provide a theoretically justified answer, this dissertation investigates four fundamental RL questions: 1) RL agents often learn much faster than what theory predicts. How common are easy instances and how can an algorithm systematically exploit them? We prove that several RL problems of practical interest are learnable much faster than what worst-case analyses suggest. This is achieved by an agnostic, instance-adaptive algorithm that can leverage benign instances to accelerate learning while preserving state-of-the-art worst-case minimax guarantees. 2) Randomized agents often perform better empirically than their deterministic counterpart. Is their performance grounded in solid theoretical roots? We provide an affirmative answer in the context of function approximation, and give the first regret bounds for randomized exploration with function approximation, placing randomized agents on solid theoretical footing. 3) Providing formal online learning guarantees with function approximation requires more stringent assumptions than what is necessary in batch RL; can these additional requirements be relaxed or is such gap fundamental? We give an encouraging answer by designing and analyzing a minimax optimal online algorithm that works under the Bellman closure of the function class, as is commonly needed in offline batch analyses of RL methods with function approximation. As a special case, this gives the first misspecification-robust algorithm and analysis for misspecified linear contextual bandits. 4) What are the fundamental limits of batch reinforcement learning? Is there a fundamental sample complexity separation between online and batch RL? The dissertation establishes the first exponential lower bounds for batch RL even for simple linear predictors, and provide the first, formal exponential separation that shows that batch RL can be exponentially harder than online RL.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2021; ©2021
Publication date 2021; 2021
Issuance monographic
Language English

Creators/Contributors

Author Zanette, Andrea
Degree supervisor Kochenderfer, Mykel J, 1980-
Thesis advisor Kochenderfer, Mykel J, 1980-
Thesis advisor Brunskill, Emma
Thesis advisor Lazaric, Alessandro
Thesis advisor Sidford, Aaron
Degree committee member Brunskill, Emma
Degree committee member Lazaric, Alessandro
Degree committee member Sidford, Aaron
Associated with Stanford University, Institute for Computational and Mathematical Engineering

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Andrea Zanette.
Note Submitted to the Institute for Computational and Mathematical Engineering.
Thesis Thesis Ph.D. Stanford University 2021.
Location https://purl.stanford.edu/ns268vs3612

Access conditions

Copyright
© 2021 by Andrea Zanette
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...