Building versatile reinforcement learning agents with offline data
- Recent advances in machine learning using deep neural networks have shown significant successes in learning from large datasets. However, these successes concentrated on computer vision and natural language processing while the progress in sequential decision-making problems is still limited. Reinforcement learning (RL) methods are designed to solve such problems, but, in contrast, struggle to scale to many real-world applications, because they rely on costly and potentially unsafe online trial-and-error and require the inefficient process of learning each skill individually from scratch. In this thesis, we will present works on designing RL agents that are trained directly from offline data and capable of mastering multiple skills, addressing the aforementioned challenges. In the first part of this thesis, we first introduce an algorithm that learns performant policies from offline datasets and improves the generalization ability of offline RL agents via expanding the offline data using rollouts generated by learned dynamics models. We then extend the method to high-dimensional observation spaces such as images and show that the method enables real-world robotic systems to perform manipulation tasks. In the second part of the thesis, to avoid the issue of learning each task from scratch in prior RL works while maintaining the benefit of offline learning, we discuss how we enable RL agents to learn a variety of tasks from diverse offline data via sharing data across tasks. Moreover, we show that sharing data requires labeling the reward of data from other tasks, which relies on heavy reward engineering and is also labor-intensive. To tackle these issues, we describe how we can effectively leverage diverse unlabeled data in offline RL, bypassing the challenge of reward labeling. Finally, we conclude with listing future directions such as effective pre-training schemes with heterogeneous unlabeled offline datasets, online fine-tuning after offline pre-training and lifelong learning with offline datasets.
|Type of resource
|electronic resource; remote; computer; online resource
|1 online resource.
|Degree committee member
|Degree committee member
|Stanford University, Computer Science Department
|Statement of responsibility
|Submitted to the Computer Science Department.
|Thesis Ph.D. Stanford University 2022.
- © 2022 by Tianhe Yu
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...