More sample efficient and robust reinforcement learning with domain knowledge
Abstract/Contents
- Abstract
- Reinforcement Learning has achieved great success on environments with good simulators (for example, Atari, Starcraft, Go, and various robotic tasks). In these settings, agents were able to achieve performance on par with or exceeding human performance. However, the application of reinforcement learning to real world human-facing applications has been limited due to various issues such as large sample complexity. This dissertation proposes methods that work towards addressing these issues by utilizing domain knowledge and structure. Domain knowledge was the main component of the first class of successful AI systems, expert-rule based systems. However, due to many challenges, including the large amount of expensive expert time required, the research community has shifted towards data-driven methods that learn automatically. This dissertation presents methods that aim to combine the benefits of expert-based systems with the strengths of reinforcement learning. These methods can have better performance, achieving better sample efficiency or more robust performance, with only minimal burden on the experts. This dissertation proposes multiple novel methods for leveraging different types of domain knowledge in multiple different reinforcement learning settings. It will introduce methods for incorporating expert domain knowledge and heuristics to speed online reinforcement learning; for incorporating repeated structure in procedure/imitation learning; for incorporating anticipated domain distribution shift for batch contextual bandit settings; and for incorporating a curriculum graph to create better personalized adaptive progressions in a real world educational webgame. We empirically evaluate our methods in simulators designed with real-world data, such as recommendation systems and educational activities sequencing. We additionally test one of our methods in a real world Korean language learning webgame. For all our methods, we demonstrate that we can achieve faster or more robust performance. This shows promise for reinforcement learning methods to be helpful in human-facing applications.
Description
Type of resource | text |
---|---|
Form | electronic resource; remote; computer; online resource |
Extent | 1 online resource. |
Place | California |
Place | [Stanford, California] |
Publisher | [Stanford University] |
Copyright date | 2022; ©2022 |
Publication date | 2022; 2022 |
Issuance | monographic |
Language | English |
Creators/Contributors
Author | Mu, Tong |
---|---|
Degree supervisor | Brunskill, Emma |
Thesis advisor | Brunskill, Emma |
Thesis advisor | Sadigh, Dorsa |
Thesis advisor | Van Roy, Benjamin |
Degree committee member | Sadigh, Dorsa |
Degree committee member | Van Roy, Benjamin |
Associated with | Stanford University, Department of Electrical Engineering |
Subjects
Genre | Theses |
---|---|
Genre | Text |
Bibliographic information
Statement of responsibility | Tong Mu. |
---|---|
Note | Submitted to the Department of Electrical Engineering. |
Thesis | Thesis Ph.D. Stanford University 2022. |
Location | https://purl.stanford.edu/zt338pg7556 |
Access conditions
- Copyright
- © 2022 by Tong Mu
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...