Meta-reinforcement learning : algorithms and applications

Placeholder Show Content

Abstract/Contents

Abstract
Reinforcement learning from scratch often requires a tremendous number of samples to learn complex tasks, but many real-world applications demand learning from only a few samples. For example, an effective news recommendation system must be able to adapt to the tastes of a new user after only observing the results of only a few recommendations. To meet the demands of such applications that require quickly learning or adapting to new tasks, this thesis focuses on meta-reinforcement learning (meta-RL). Specifically we consider a setting where the agent is repeatedly presented with new tasks, all drawn from some related task family. The agent must learn each new task in only a few shots, formalized as a few episodes of interaction with the task. How the agent spends its few shots critically determines whether it will be able to subsequently solve the task, but learning to effectively use the few shots is challenging because there is no direct supervision. In this thesis, we argue that effectively leveraging the few shots — and hence, learning to quickly solve new tasks — requires carefully decoupling learning to spend the few shots from learning to solve the task. Concretely, we show that existing meta-RL algorithms that do not decouple the two struggle to learn complex strategies for spending the few shots due to a chicken-and-egg problem, where learning to effectively spend the few shots depends on having already learned to solve the task and vice-versa. We then address this problem with a new algorithm called Dream that decouples the two. Additionally, we also study how to leverage pre-collected offline data in this setting. We show that popular approaches for extracting skills from the offline data to quickly learn new tasks use an underspecified objective with degenerate solutions, and address this with an auxiliary objective that makes the optimization problem well-specified. Our algorithms enable previously unexplored applications with meta-RL. Specifically, we show that (1) Dream enables a new paradigm for language learning without large text datasets by learning language in the process of solving tasks that do not necessarily require language. For example, in our experiments, Dream learns to read building floor plans with language descriptions in the process of learning to navigate to particular offices in various buildings; and (2) Dream can help automatically grade interactive computer science assignments that typically require significant manual grading. We deployed Dream to assist with grading the Breakout assignment in Stanford's introductory computer science course and found that it sped up grading by 28%, corresponding to about 10 hours, without sacrificing accuracy.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2023; ©2023
Publication date 2023; 2023
Issuance monographic
Language English

Creators/Contributors

Author Liu, Evan Zheran
Degree supervisor Finn, Chelsea
Thesis advisor Finn, Chelsea
Thesis advisor Brunskill, Emma
Thesis advisor Liang, Percy
Degree committee member Brunskill, Emma
Degree committee member Liang, Percy
Associated with Stanford University, School of Engineering
Associated with Stanford University, Computer Science Department

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Evan Zheran Liu.
Note Submitted to the Computer Science Department.
Thesis Thesis Ph.D. Stanford University 2023.
Location https://purl.stanford.edu/zf342ty7446

Access conditions

Copyright
© 2023 by Evan Zheran Liu
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...