Reinforcement learning : when can we do sample efficient exploration?
Abstract/Contents
- Abstract
- The performance of Reinforcement Learning (RL) agents varies wildly with the underlying problem: sometimes they learn quickly, some other time slowly, and sometime not at all. When can an RL agent quickly learn good policies? To provide a theoretically justified answer, this dissertation investigates four fundamental RL questions: 1) RL agents often learn much faster than what theory predicts. How common are easy instances and how can an algorithm systematically exploit them? We prove that several RL problems of practical interest are learnable much faster than what worst-case analyses suggest. This is achieved by an agnostic, instance-adaptive algorithm that can leverage benign instances to accelerate learning while preserving state-of-the-art worst-case minimax guarantees. 2) Randomized agents often perform better empirically than their deterministic counterpart. Is their performance grounded in solid theoretical roots? We provide an affirmative answer in the context of function approximation, and give the first regret bounds for randomized exploration with function approximation, placing randomized agents on solid theoretical footing. 3) Providing formal online learning guarantees with function approximation requires more stringent assumptions than what is necessary in batch RL; can these additional requirements be relaxed or is such gap fundamental? We give an encouraging answer by designing and analyzing a minimax optimal online algorithm that works under the Bellman closure of the function class, as is commonly needed in offline batch analyses of RL methods with function approximation. As a special case, this gives the first misspecification-robust algorithm and analysis for misspecified linear contextual bandits. 4) What are the fundamental limits of batch reinforcement learning? Is there a fundamental sample complexity separation between online and batch RL? The dissertation establishes the first exponential lower bounds for batch RL even for simple linear predictors, and provide the first, formal exponential separation that shows that batch RL can be exponentially harder than online RL.
Description
Type of resource | text |
---|---|
Form | electronic resource; remote; computer; online resource |
Extent | 1 online resource. |
Place | California |
Place | [Stanford, California] |
Publisher | [Stanford University] |
Copyright date | 2021; ©2021 |
Publication date | 2021; 2021 |
Issuance | monographic |
Language | English |
Creators/Contributors
Author | Zanette, Andrea |
---|---|
Degree supervisor | Kochenderfer, Mykel J, 1980- |
Thesis advisor | Kochenderfer, Mykel J, 1980- |
Thesis advisor | Brunskill, Emma |
Thesis advisor | Lazaric, Alessandro |
Thesis advisor | Sidford, Aaron |
Degree committee member | Brunskill, Emma |
Degree committee member | Lazaric, Alessandro |
Degree committee member | Sidford, Aaron |
Associated with | Stanford University, Institute for Computational and Mathematical Engineering |
Subjects
Genre | Theses |
---|---|
Genre | Text |
Bibliographic information
Statement of responsibility | Andrea Zanette. |
---|---|
Note | Submitted to the Institute for Computational and Mathematical Engineering. |
Thesis | Thesis Ph.D. Stanford University 2021. |
Location | https://purl.stanford.edu/ns268vs3612 |
Access conditions
- Copyright
- © 2021 by Andrea Zanette
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...