Compositional reasoning in robot learning

Placeholder Show Content

Abstract/Contents

Abstract
To carry out diverse tasks in everyday human environments, future robots must generalize beyond the knowledge they are equipped with. However, despite recent advances in "end-to-end" deep learning, today's robot learning methods are still limited to specializing in one task at a time. At the same time, humans perform everyday tasks with ease. But instead of learning each task in silos, we distill reusable abstractions from our daily experiences and solve new tasks by composing known building blocks. Such compositional reasoning capability is crucial for developing future robots that are both competent and versatile. This two-part thesis presents a spectrum of techniques for building compositional generalization capabilities into robot learning systems. Part I starts by drawing prominent ideas from classical structured approaches in AI such as program induction and graphical models and design algorithms with strong representational priors. Neural Task Programming (NTP) represents task-subtask hierarchies using modular neural programs. NTP is designed to simultaneously exploit the representational capacity of neural networks for handling unstructured input and the compositional nature of programs for few-shot generalization. Similarly, Neural Task Graphs (NTGs) represent manipulation skills as nodes in a graph and leverage structures in skill preconditions for generalization. Regression Planning Network (RPN) further dissects the goal specification by grounding symbolic goals with object-centric representation and achieve zero-shot generalization to new task goals. To bring compositional reasoning closer to real-world settings, Part II of this thesis introduces our recent efforts to relax the strong representational prior assumptions: learning from unstructured video demonstrations and learning through trial-and-error. We first describe a framework for simultaneously learning complex visuomotor skills and discovering implicit compositional structures from human demonstrations. Experiment shows the approach allows a physical robot to perform long-horizon manipulation in a kitchen setup. Then we further relax the assumption of having demonstration data and enable a robot to learn compact planning representations by characterizing the pre- and post-condition of motion primitives through active interaction.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2021; ©2021
Publication date 2021; 2021
Issuance monographic
Language English

Creators/Contributors

Author Xu, Danfei
Degree supervisor Li, Fei Fei, 1976-
Degree supervisor Savarese, Silvio
Thesis advisor Li, Fei Fei, 1976-
Thesis advisor Savarese, Silvio
Thesis advisor Bohg, Jeannette, 1981-
Thesis advisor Sadigh, Dorsa
Degree committee member Bohg, Jeannette, 1981-
Degree committee member Sadigh, Dorsa
Associated with Stanford University, Computer Science Department

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Danfei Xu.
Note Submitted to the Computer Science Department.
Thesis Thesis Ph.D. Stanford University 2021.
Location https://purl.stanford.edu/ys432pr6718

Access conditions

Copyright
© 2021 by Danfei Xu
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...