Scaling deep robotic learning to broad real-world data
- From general object grasping to in-hand manipulation, deep learning has enabled a number of exciting robotic manipulation capabilities in recent years. Despite this, the quintessential home robot that can enter a previously unseen home environment and complete a wide range of tasks like humans can is far from a reality. While there are many problems to solve in accomplishing this goal, one of the central bottlenecks lies in learning control policies from the robot's sensor inputs that can generalize to new tasks, objects, and environments. For example, a robot cooking in a home cannot afford to re-learn from scratch for each new dish, nor is it feasible to hard-code state features for every new kitchen a robot might encounter. One potential route to accomplishing this generalization is to train the robot on a wide distribution of data that contains many tasks, objects, and environments. Indeed, this recipe of large, diverse datasets combined with scalable offline learning algorithms (e.g. self-supervised or cheaply supervised learning) has been the key behind recent successes in natural language processing (NLP) and vision. However, directly extending this recipe to robotics is nontrivial, as we neither have sufficiently large and diverse datasets of robot interaction nor is it obvious what types of learning algorithms or sources of supervision can enable us to scalably learn skills from these datasets. The goal of this thesis lies in tackling these challenges, and replicating the recipe of large-scale data and learning in the context of robotic manipulation. The first part of this thesis will discuss how we can scalably collect large and diverse datasets of robots interacting in the physical world and how we can effectively pre-train self-supervised world models on such offline robot datasets. We'll then explore how we might use these pre-trained world models to solve tasks by combining them with planning, first for solving long-horizon manipulation tasks, and second for completing tasks specified by natural language. Finally, we'll discuss how we might go beyond robot data and unlock the broad sources of data that exist on the web, like videos of humans, to enable more effective learning in our robots, specifically through reward learning and visual pre-training. The thesis will conclude by discussing open challenges, particularly how we might unify the paradigms of simulation, real-world data collection, and videos of humans to realize the vision of a general-purpose household robot.
|Type of resource
|electronic resource; remote; computer; online resource
|1 online resource.
|Degree committee member
|Stanford University, School of Engineering
|Stanford University, Computer Science Department
|Statement of responsibility
|Submitted to the Computer Science Department.
|Thesis Ph.D. Stanford University 2023.
- © 2023 by Suraj Nair
- This work is licensed under a Creative Commons Attribution 3.0 Unported license (CC BY).
Also listed in
Loading usage metrics...