Compositional reasoning in robot learning

Xu, Danfei

Compositional reasoning in robot learning

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fys432pr6718" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: To carry out diverse tasks in everyday human environments, future robots must generalize beyond the knowledge they are equipped with. However, despite recent advances in "end-to-end" deep learning, today's robot learning methods are still limited to specializing in one task at a time. At the same time, humans perform everyday tasks with ease. But instead of learning each task in silos, we distill reusable abstractions from our daily experiences and solve new tasks by composing known building blocks. Such compositional reasoning capability is crucial for developing future robots that are both competent and versatile. This two-part thesis presents a spectrum of techniques for building compositional generalization capabilities into robot learning systems. Part I starts by drawing prominent ideas from classical structured approaches in AI such as program induction and graphical models and design algorithms with strong representational priors. Neural Task Programming (NTP) represents task-subtask hierarchies using modular neural programs. NTP is designed to simultaneously exploit the representational capacity of neural networks for handling unstructured input and the compositional nature of programs for few-shot generalization. Similarly, Neural Task Graphs (NTGs) represent manipulation skills as nodes in a graph and leverage structures in skill preconditions for generalization. Regression Planning Network (RPN) further dissects the goal specification by grounding symbolic goals with object-centric representation and achieve zero-shot generalization to new task goals. To bring compositional reasoning closer to real-world settings, Part II of this thesis introduces our recent efforts to relax the strong representational prior assumptions: learning from unstructured video demonstrations and learning through trial-and-error. We first describe a framework for simultaneously learning complex visuomotor skills and discovering implicit compositional structures from human demonstrations. Experiment shows the approach allows a physical robot to perform long-horizon manipulation in a kitchen setup. Then we further relax the assumption of having demonstration data and enable a robot to learn compact planning representations by characterizing the pre- and post-condition of motion primitives through active interaction.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2021; ©2021
Publication date	2021; 2021
Issuance	monographic
Language	English

Creators/Contributors

Author	Xu, Danfei
Degree supervisor	Li, Fei Fei, 1976-
Degree supervisor	Savarese, Silvio
Thesis advisor	Li, Fei Fei, 1976-
Thesis advisor	Savarese, Silvio
Thesis advisor	Bohg, Jeannette, 1981-
Thesis advisor	Sadigh, Dorsa
Degree committee member	Bohg, Jeannette, 1981-
Degree committee member	Sadigh, Dorsa
Associated with	Stanford University, Computer Science Department

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Danfei Xu.
Note	Submitted to the Computer Science Department.
Thesis	Thesis Ph.D. Stanford University 2021.
Location	https://purl.stanford.edu/ys432pr6718

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...