Building versatile reinforcement learning agents with offline data

Placeholder Show Content

Abstract/Contents

Abstract
Recent advances in machine learning using deep neural networks have shown significant successes in learning from large datasets. However, these successes concentrated on computer vision and natural language processing while the progress in sequential decision-making problems is still limited. Reinforcement learning (RL) methods are designed to solve such problems, but, in contrast, struggle to scale to many real-world applications, because they rely on costly and potentially unsafe online trial-and-error and require the inefficient process of learning each skill individually from scratch. In this thesis, we will present works on designing RL agents that are trained directly from offline data and capable of mastering multiple skills, addressing the aforementioned challenges. In the first part of this thesis, we first introduce an algorithm that learns performant policies from offline datasets and improves the generalization ability of offline RL agents via expanding the offline data using rollouts generated by learned dynamics models. We then extend the method to high-dimensional observation spaces such as images and show that the method enables real-world robotic systems to perform manipulation tasks. In the second part of the thesis, to avoid the issue of learning each task from scratch in prior RL works while maintaining the benefit of offline learning, we discuss how we enable RL agents to learn a variety of tasks from diverse offline data via sharing data across tasks. Moreover, we show that sharing data requires labeling the reward of data from other tasks, which relies on heavy reward engineering and is also labor-intensive. To tackle these issues, we describe how we can effectively leverage diverse unlabeled data in offline RL, bypassing the challenge of reward labeling. Finally, we conclude with listing future directions such as effective pre-training schemes with heterogeneous unlabeled offline datasets, online fine-tuning after offline pre-training and lifelong learning with offline datasets.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2022; ©2022
Publication date 2022; 2022
Issuance monographic
Language English

Creators/Contributors

Author Yu, Tianhe
Degree supervisor Finn, Chelsea
Thesis advisor Finn, Chelsea
Thesis advisor Ermon, Stefano
Thesis advisor Sadigh, Dorsa
Degree committee member Ermon, Stefano
Degree committee member Sadigh, Dorsa
Associated with Stanford University, Computer Science Department

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Tianhe Yu.
Note Submitted to the Computer Science Department.
Thesis Thesis Ph.D. Stanford University 2022.
Location https://purl.stanford.edu/pm875yj1993

Access conditions

Copyright
© 2022 by Tianhe Yu
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...