Building versatile reinforcement learning agents with offline data

Yu, Tianhe

Building versatile reinforcement learning agents with offline data

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fpm875yj1993" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Recent advances in machine learning using deep neural networks have shown significant successes in learning from large datasets. However, these successes concentrated on computer vision and natural language processing while the progress in sequential decision-making problems is still limited. Reinforcement learning (RL) methods are designed to solve such problems, but, in contrast, struggle to scale to many real-world applications, because they rely on costly and potentially unsafe online trial-and-error and require the inefficient process of learning each skill individually from scratch. In this thesis, we will present works on designing RL agents that are trained directly from offline data and capable of mastering multiple skills, addressing the aforementioned challenges. In the first part of this thesis, we first introduce an algorithm that learns performant policies from offline datasets and improves the generalization ability of offline RL agents via expanding the offline data using rollouts generated by learned dynamics models. We then extend the method to high-dimensional observation spaces such as images and show that the method enables real-world robotic systems to perform manipulation tasks. In the second part of the thesis, to avoid the issue of learning each task from scratch in prior RL works while maintaining the benefit of offline learning, we discuss how we enable RL agents to learn a variety of tasks from diverse offline data via sharing data across tasks. Moreover, we show that sharing data requires labeling the reward of data from other tasks, which relies on heavy reward engineering and is also labor-intensive. To tackle these issues, we describe how we can effectively leverage diverse unlabeled data in offline RL, bypassing the challenge of reward labeling. Finally, we conclude with listing future directions such as effective pre-training schemes with heterogeneous unlabeled offline datasets, online fine-tuning after offline pre-training and lifelong learning with offline datasets.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2022; ©2022
Publication date	2022; 2022
Issuance	monographic
Language	English

Creators/Contributors

Author	Yu, Tianhe
Degree supervisor	Finn, Chelsea
Thesis advisor	Finn, Chelsea
Thesis advisor	Ermon, Stefano
Thesis advisor	Sadigh, Dorsa
Degree committee member	Ermon, Stefano
Degree committee member	Sadigh, Dorsa
Associated with	Stanford University, Computer Science Department

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Tianhe Yu.
Note	Submitted to the Computer Science Department.
Thesis	Thesis Ph.D. Stanford University 2022.
Location	https://purl.stanford.edu/pm875yj1993

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...