Learning from imperfect demonstrations
- Imitation learning is one of the most promising robot learning paradigms, which attempts to learn robot policies from demonstrations. Standard imitation learning algorithms assume that the demonstrations are provided by optimal experts who are capable of perfectly performing the task on the robot of interest (i.e., the target environment). However, under these assumptions, imitation learning usually requires large amounts of demonstrations. This is limiting in many environments, where collecting optimal demonstrations is difficult due to various reasons such as difficulty of controlling robots with high degrees of freedom or limited interactions with the environment. In practice, we often have access to large amounts of imperfect demonstrations, which are possibly not optimal or are produced by different agents with different morphologies or dynamics. Such imperfect demonstrations contain valuable information that can be helpful for learning the optimal policy. However, directly imitating these imperfect demonstrations will lead to learning a suboptimal policy. Instead of direct imitation learning, we propose developing new algorithms to utilize these imperfect demonstrations towards learning an optimal robot policy. In this thesis, we categorize imperfect demonstrations into: i) suboptimal demonstrations, ii) cross-domain demonstrations, and iii) infeasible demonstrations. Suboptimal demonstrations often contain non-optimal sequence of states and actions. For example when reaching an object, the robot might take a longer path towards the goal. Cross-domain demonstrations are collected from agents with different morphologies or dynamics, but such demonstrations can still have correspondence to behaviors of the target agent. Finally, infeasible demonstrations are drawn from other agents that might not have any correspondence to the target agent. Prior works in learning from imperfect demonstrations only focus on one of these categories of imperfect demonstrations. In this thesis, we comprehensively address the problem of learning from imperfect demonstrations: We formalize the different categories of imperfect demonstrations and introduce a set of robot learning algorithms that tackle each category when learning from these demonstrations. We will further discuss under what assumptions each of our methods should be used with imperfect demonstrations. We conduct experiments in a number of robotics manipulation tasks in simulation and real to demonstrate the developed algorithms.
|Type of resource
|electronic resource; remote; computer; online resource
|1 online resource.
|Degree committee member
|Degree committee member
|Stanford University, Computer Science Department
|Statement of responsibility
|Submitted to the Computer Science Department.
|Thesis Ph.D. Stanford University 2022.
- © 2022 by Zhangjie Cao
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...