Learning preferences for interactive autonomy

Biyik, Erdem

Learning preferences for interactive autonomy

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fvz918rc3628" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: When robots enter everyday human environments, they need to understand their tasks and how they should perform those tasks. To encode these, reward functions, which specify the objective of a robot, are employed. However, designing reward functions can be extremely challenging for complex tasks and environments. A more promising approach is to learn reward functions from humans. Recently, several robot learning works embrace this approach and leverage human demonstrations to learn the reward functions. Known as inverse reinforcement learning, this approach relies on a fundamental assumption: humans can provide near-optimal demonstrations to the robot. Unfortunately, this is rarely the case: human demonstrations are often suboptimal due to various reasons, e.g., difficulty of teleoperation, robot having high degrees of freedom, or humans' cognitive limitations. This thesis is an attempt towards learning reward functions from humans by using other more reliable data modalities. Specifically, we study how reward functions can be learned using comparative feedback, in which the human user compares multiple robot trajectories instead of (or in addition to) providing demonstrations. To this end, we first propose various forms of comparative feedback, e.g., pairwise comparisons, best-of-many choices, rankings, scaled comparisons; and describe how a robot can use these various forms of human feedback to infer a reward function, which may be parametric or non-parametric. We discuss the pros and cons of each comparative feedback modality in detail, and show how such feedback enables us to outperform standard inverse reinforcement learning that only utilizes demonstrations. To increase data-efficiency, we also propose active learning techniques to enable the robot to ask for comparison feedback that optimizes for the expected information that will be gained from that user feedback.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2022; ©2022
Publication date	2022; 2022
Issuance	monographic
Language	English

Creators/Contributors

Author	Biyik, Erdem
Degree supervisor	Sadigh, Dorsa
Thesis advisor	Sadigh, Dorsa
Thesis advisor	Brunskill, Emma
Thesis advisor	Finn, Chelsea
Degree committee member	Brunskill, Emma
Degree committee member	Finn, Chelsea
Associated with	Stanford University, Department of Electrical Engineering

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Erdem Biyik.
Note	Submitted to the Department of Electrical Engineering.
Thesis	Thesis Ph.D. Stanford University 2022.
Location	https://purl.stanford.edu/vz918rc3628

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...