More sample efficient and robust reinforcement learning with domain knowledge

Placeholder Show Content

Abstract/Contents

Abstract
Reinforcement Learning has achieved great success on environments with good simulators (for example, Atari, Starcraft, Go, and various robotic tasks). In these settings, agents were able to achieve performance on par with or exceeding human performance. However, the application of reinforcement learning to real world human-facing applications has been limited due to various issues such as large sample complexity. This dissertation proposes methods that work towards addressing these issues by utilizing domain knowledge and structure. Domain knowledge was the main component of the first class of successful AI systems, expert-rule based systems. However, due to many challenges, including the large amount of expensive expert time required, the research community has shifted towards data-driven methods that learn automatically. This dissertation presents methods that aim to combine the benefits of expert-based systems with the strengths of reinforcement learning. These methods can have better performance, achieving better sample efficiency or more robust performance, with only minimal burden on the experts. This dissertation proposes multiple novel methods for leveraging different types of domain knowledge in multiple different reinforcement learning settings. It will introduce methods for incorporating expert domain knowledge and heuristics to speed online reinforcement learning; for incorporating repeated structure in procedure/imitation learning; for incorporating anticipated domain distribution shift for batch contextual bandit settings; and for incorporating a curriculum graph to create better personalized adaptive progressions in a real world educational webgame. We empirically evaluate our methods in simulators designed with real-world data, such as recommendation systems and educational activities sequencing. We additionally test one of our methods in a real world Korean language learning webgame. For all our methods, we demonstrate that we can achieve faster or more robust performance. This shows promise for reinforcement learning methods to be helpful in human-facing applications.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2022; ©2022
Publication date 2022; 2022
Issuance monographic
Language English

Creators/Contributors

Author Mu, Tong
Degree supervisor Brunskill, Emma
Thesis advisor Brunskill, Emma
Thesis advisor Sadigh, Dorsa
Thesis advisor Van Roy, Benjamin
Degree committee member Sadigh, Dorsa
Degree committee member Van Roy, Benjamin
Associated with Stanford University, Department of Electrical Engineering

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Tong Mu.
Note Submitted to the Department of Electrical Engineering.
Thesis Thesis Ph.D. Stanford University 2022.
Location https://purl.stanford.edu/zt338pg7556

Access conditions

Copyright
© 2022 by Tong Mu
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...