Scaling deep robotic learning to broad real-world data

Placeholder Show Content

Abstract/Contents

Abstract
From general object grasping to in-hand manipulation, deep learning has enabled a number of exciting robotic manipulation capabilities in recent years. Despite this, the quintessential home robot that can enter a previously unseen home environment and complete a wide range of tasks like humans can is far from a reality. While there are many problems to solve in accomplishing this goal, one of the central bottlenecks lies in learning control policies from the robot's sensor inputs that can generalize to new tasks, objects, and environments. For example, a robot cooking in a home cannot afford to re-learn from scratch for each new dish, nor is it feasible to hard-code state features for every new kitchen a robot might encounter. One potential route to accomplishing this generalization is to train the robot on a wide distribution of data that contains many tasks, objects, and environments. Indeed, this recipe of large, diverse datasets combined with scalable offline learning algorithms (e.g. self-supervised or cheaply supervised learning) has been the key behind recent successes in natural language processing (NLP) and vision. However, directly extending this recipe to robotics is nontrivial, as we neither have sufficiently large and diverse datasets of robot interaction nor is it obvious what types of learning algorithms or sources of supervision can enable us to scalably learn skills from these datasets. The goal of this thesis lies in tackling these challenges, and replicating the recipe of large-scale data and learning in the context of robotic manipulation. The first part of this thesis will discuss how we can scalably collect large and diverse datasets of robots interacting in the physical world and how we can effectively pre-train self-supervised world models on such offline robot datasets. We'll then explore how we might use these pre-trained world models to solve tasks by combining them with planning, first for solving long-horizon manipulation tasks, and second for completing tasks specified by natural language. Finally, we'll discuss how we might go beyond robot data and unlock the broad sources of data that exist on the web, like videos of humans, to enable more effective learning in our robots, specifically through reward learning and visual pre-training. The thesis will conclude by discussing open challenges, particularly how we might unify the paradigms of simulation, real-world data collection, and videos of humans to realize the vision of a general-purpose household robot.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2023; ©2023
Publication date 2023; 2023
Issuance monographic
Language English

Creators/Contributors

Author Nair, Suraj
Degree supervisor Finn, Chelsea
Thesis advisor Finn, Chelsea
Thesis advisor Sadigh, Dorsa
Degree committee member Sadigh, Dorsa
Associated with Stanford University, School of Engineering
Associated with Stanford University, Computer Science Department

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Suraj Nair.
Note Submitted to the Computer Science Department.
Thesis Thesis Ph.D. Stanford University 2023.
Location https://purl.stanford.edu/fk655fk4359

Access conditions

Copyright
© 2023 by Suraj Nair
License
This work is licensed under a Creative Commons Attribution 3.0 Unported license (CC BY).

Also listed in

Loading usage metrics...