Learning from imperfect demonstrations

Placeholder Show Content


Imitation learning is one of the most promising robot learning paradigms, which attempts to learn robot policies from demonstrations. Standard imitation learning algorithms assume that the demonstrations are provided by optimal experts who are capable of perfectly performing the task on the robot of interest (i.e., the target environment). However, under these assumptions, imitation learning usually requires large amounts of demonstrations. This is limiting in many environments, where collecting optimal demonstrations is difficult due to various reasons such as difficulty of controlling robots with high degrees of freedom or limited interactions with the environment. In practice, we often have access to large amounts of imperfect demonstrations, which are possibly not optimal or are produced by different agents with different morphologies or dynamics. Such imperfect demonstrations contain valuable information that can be helpful for learning the optimal policy. However, directly imitating these imperfect demonstrations will lead to learning a suboptimal policy. Instead of direct imitation learning, we propose developing new algorithms to utilize these imperfect demonstrations towards learning an optimal robot policy. In this thesis, we categorize imperfect demonstrations into: i) suboptimal demonstrations, ii) cross-domain demonstrations, and iii) infeasible demonstrations. Suboptimal demonstrations often contain non-optimal sequence of states and actions. For example when reaching an object, the robot might take a longer path towards the goal. Cross-domain demonstrations are collected from agents with different morphologies or dynamics, but such demonstrations can still have correspondence to behaviors of the target agent. Finally, infeasible demonstrations are drawn from other agents that might not have any correspondence to the target agent. Prior works in learning from imperfect demonstrations only focus on one of these categories of imperfect demonstrations. In this thesis, we comprehensively address the problem of learning from imperfect demonstrations: We formalize the different categories of imperfect demonstrations and introduce a set of robot learning algorithms that tackle each category when learning from these demonstrations. We will further discuss under what assumptions each of our methods should be used with imperfect demonstrations. We conduct experiments in a number of robotics manipulation tasks in simulation and real to demonstrate the developed algorithms.


Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2022; ©2022
Publication date 2022; 2022
Issuance monographic
Language English


Author Cao, Zhangjie
Degree supervisor Sadigh, Dorsa
Thesis advisor Sadigh, Dorsa
Thesis advisor Finn, Chelsea
Thesis advisor Rosman, Guy
Degree committee member Finn, Chelsea
Degree committee member Rosman, Guy
Associated with Stanford University, Computer Science Department


Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Zhangjie Cao.
Note Submitted to the Computer Science Department.
Thesis Thesis Ph.D. Stanford University 2022.
Location https://purl.stanford.edu/vq107gn6549

Access conditions

© 2022 by Zhangjie Cao
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...