Meta-reinforcement learning : algorithms and applications

Liu, Evan Zheran

Meta-reinforcement learning : algorithms and applications

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fzf342ty7446" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Reinforcement learning from scratch often requires a tremendous number of samples to learn complex tasks, but many real-world applications demand learning from only a few samples. For example, an effective news recommendation system must be able to adapt to the tastes of a new user after only observing the results of only a few recommendations. To meet the demands of such applications that require quickly learning or adapting to new tasks, this thesis focuses on meta-reinforcement learning (meta-RL). Specifically we consider a setting where the agent is repeatedly presented with new tasks, all drawn from some related task family. The agent must learn each new task in only a few shots, formalized as a few episodes of interaction with the task. How the agent spends its few shots critically determines whether it will be able to subsequently solve the task, but learning to effectively use the few shots is challenging because there is no direct supervision. In this thesis, we argue that effectively leveraging the few shots — and hence, learning to quickly solve new tasks — requires carefully decoupling learning to spend the few shots from learning to solve the task. Concretely, we show that existing meta-RL algorithms that do not decouple the two struggle to learn complex strategies for spending the few shots due to a chicken-and-egg problem, where learning to effectively spend the few shots depends on having already learned to solve the task and vice-versa. We then address this problem with a new algorithm called Dream that decouples the two. Additionally, we also study how to leverage pre-collected offline data in this setting. We show that popular approaches for extracting skills from the offline data to quickly learn new tasks use an underspecified objective with degenerate solutions, and address this with an auxiliary objective that makes the optimization problem well-specified. Our algorithms enable previously unexplored applications with meta-RL. Specifically, we show that (1) Dream enables a new paradigm for language learning without large text datasets by learning language in the process of solving tasks that do not necessarily require language. For example, in our experiments, Dream learns to read building floor plans with language descriptions in the process of learning to navigate to particular offices in various buildings; and (2) Dream can help automatically grade interactive computer science assignments that typically require significant manual grading. We deployed Dream to assist with grading the Breakout assignment in Stanford's introductory computer science course and found that it sped up grading by 28%, corresponding to about 10 hours, without sacrificing accuracy.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2023; ©2023
Publication date	2023; 2023
Issuance	monographic
Language	English

Creators/Contributors

Author	Liu, Evan Zheran
Degree supervisor	Finn, Chelsea
Thesis advisor	Finn, Chelsea
Thesis advisor	Brunskill, Emma
Thesis advisor	Liang, Percy
Degree committee member	Brunskill, Emma
Degree committee member	Liang, Percy
Associated with	Stanford University, School of Engineering
Associated with	Stanford University, Computer Science Department

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Evan Zheran Liu.
Note	Submitted to the Computer Science Department.
Thesis	Thesis Ph.D. Stanford University 2023.
Location	https://purl.stanford.edu/zf342ty7446

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...