Meta-reinforcement learning : algorithms and applications
- Reinforcement learning from scratch often requires a tremendous number of samples to learn complex tasks, but many real-world applications demand learning from only a few samples. For example, an effective news recommendation system must be able to adapt to the tastes of a new user after only observing the results of only a few recommendations. To meet the demands of such applications that require quickly learning or adapting to new tasks, this thesis focuses on meta-reinforcement learning (meta-RL). Specifically we consider a setting where the agent is repeatedly presented with new tasks, all drawn from some related task family. The agent must learn each new task in only a few shots, formalized as a few episodes of interaction with the task. How the agent spends its few shots critically determines whether it will be able to subsequently solve the task, but learning to effectively use the few shots is challenging because there is no direct supervision. In this thesis, we argue that effectively leveraging the few shots — and hence, learning to quickly solve new tasks — requires carefully decoupling learning to spend the few shots from learning to solve the task. Concretely, we show that existing meta-RL algorithms that do not decouple the two struggle to learn complex strategies for spending the few shots due to a chicken-and-egg problem, where learning to effectively spend the few shots depends on having already learned to solve the task and vice-versa. We then address this problem with a new algorithm called Dream that decouples the two. Additionally, we also study how to leverage pre-collected offline data in this setting. We show that popular approaches for extracting skills from the offline data to quickly learn new tasks use an underspecified objective with degenerate solutions, and address this with an auxiliary objective that makes the optimization problem well-specified. Our algorithms enable previously unexplored applications with meta-RL. Specifically, we show that (1) Dream enables a new paradigm for language learning without large text datasets by learning language in the process of solving tasks that do not necessarily require language. For example, in our experiments, Dream learns to read building floor plans with language descriptions in the process of learning to navigate to particular offices in various buildings; and (2) Dream can help automatically grade interactive computer science assignments that typically require significant manual grading. We deployed Dream to assist with grading the Breakout assignment in Stanford's introductory computer science course and found that it sped up grading by 28%, corresponding to about 10 hours, without sacrificing accuracy.
|Type of resource
|electronic resource; remote; computer; online resource
|1 online resource.
|Liu, Evan Zheran
|Degree committee member
|Degree committee member
|Stanford University, School of Engineering
|Stanford University, Computer Science Department
|Statement of responsibility
|Evan Zheran Liu.
|Submitted to the Computer Science Department.
|Thesis Ph.D. Stanford University 2023.
- © 2023 by Evan Zheran Liu
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...