Learning from imperfect demonstrations

Cao, Zhangjie

Learning from imperfect demonstrations

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fvq107gn6549" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Imitation learning is one of the most promising robot learning paradigms, which attempts to learn robot policies from demonstrations. Standard imitation learning algorithms assume that the demonstrations are provided by optimal experts who are capable of perfectly performing the task on the robot of interest (i.e., the target environment). However, under these assumptions, imitation learning usually requires large amounts of demonstrations. This is limiting in many environments, where collecting optimal demonstrations is difficult due to various reasons such as difficulty of controlling robots with high degrees of freedom or limited interactions with the environment. In practice, we often have access to large amounts of imperfect demonstrations, which are possibly not optimal or are produced by different agents with different morphologies or dynamics. Such imperfect demonstrations contain valuable information that can be helpful for learning the optimal policy. However, directly imitating these imperfect demonstrations will lead to learning a suboptimal policy. Instead of direct imitation learning, we propose developing new algorithms to utilize these imperfect demonstrations towards learning an optimal robot policy. In this thesis, we categorize imperfect demonstrations into: i) suboptimal demonstrations, ii) cross-domain demonstrations, and iii) infeasible demonstrations. Suboptimal demonstrations often contain non-optimal sequence of states and actions. For example when reaching an object, the robot might take a longer path towards the goal. Cross-domain demonstrations are collected from agents with different morphologies or dynamics, but such demonstrations can still have correspondence to behaviors of the target agent. Finally, infeasible demonstrations are drawn from other agents that might not have any correspondence to the target agent. Prior works in learning from imperfect demonstrations only focus on one of these categories of imperfect demonstrations. In this thesis, we comprehensively address the problem of learning from imperfect demonstrations: We formalize the different categories of imperfect demonstrations and introduce a set of robot learning algorithms that tackle each category when learning from these demonstrations. We will further discuss under what assumptions each of our methods should be used with imperfect demonstrations. We conduct experiments in a number of robotics manipulation tasks in simulation and real to demonstrate the developed algorithms.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2022; ©2022
Publication date	2022; 2022
Issuance	monographic
Language	English

Creators/Contributors

Author	Cao, Zhangjie
Degree supervisor	Sadigh, Dorsa
Thesis advisor	Sadigh, Dorsa
Thesis advisor	Finn, Chelsea
Thesis advisor	Rosman, Guy
Degree committee member	Finn, Chelsea
Degree committee member	Rosman, Guy
Associated with	Stanford University, Computer Science Department

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Zhangjie Cao.
Note	Submitted to the Computer Science Department.
Thesis	Thesis Ph.D. Stanford University 2022.
Location	https://purl.stanford.edu/vq107gn6549

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...