Information-directed sampling for reinforcement learning
Abstract/Contents
- Abstract
- Reinforcement learning has enjoyed a resurgence in popularity over the past decade thanks to the ever-increasing availability of computing power. Many success stories of reinforcement learning seem to suggest a potential gateway to creating intelligent agents that are capable of performing tasks with human-level proficiency. However, many state-of-the-art reinforcement learning algorithms require a tremendous amount of simulated data, which is not practical when data is generated from actual interactions in the real world. Addressing data efficiency will be crucial for making reinforcement learning practical for real-world applications. In this dissertation, we take an information-theoretic approach to reason about how an agent should acquire information in an environment to improve decision-making. We generalize the information-directed sampling (IDS) decision rule from online decision-making literature to reinforcement learning. This decision rule aims to acquire useful information about the environment while also taking into consideration the costs of information acquisition. We argue that IDS can demonstrate desirable information-seeking behaviors in a reinforcement learning problem where existing methods fail. We hypothesize that in practical environments that are typically rich in observations, IDS has the potential to significantly improve data efficiency relative to existing exploration schemes. Furthermore, we analyze the expected regret of IDS for three stylized classes of environments, linear bandits, tabular Markov decision processes (MDPs), and factored MDPs. We derive regret bounds that are nearly competitive with state-of-the-art regret bounds, which demonstrate promise of our information-theoretic design concept. Lastly, the form of IDS studied in this dissertation should be viewed as an agent design concept rather than a concrete algorithm. Major work needs to be done to design practical algorithms that preserve the benefits of this conceptual decision rule while being computationally tractable. We highlight some key aspects for designing a practical IDS agent and propose several research directions for addressing each aspect.
Description
Type of resource | text |
---|---|
Form | electronic resource; remote; computer; online resource |
Extent | 1 online resource. |
Place | California |
Place | [Stanford, California] |
Publisher | [Stanford University] |
Copyright date | 2020; ©2020 |
Publication date | 2020; 2020 |
Issuance | monographic |
Language | English |
Creators/Contributors
Author | Lu, Xiuyuan |
---|---|
Degree supervisor | Van Roy, Benjamin |
Thesis advisor | Van Roy, Benjamin |
Thesis advisor | Brunskill, Emma |
Thesis advisor | Johari, Ramesh, 1976- |
Degree committee member | Brunskill, Emma |
Degree committee member | Johari, Ramesh, 1976- |
Associated with | Stanford University, Department of Management Science & Engineering |
Subjects
Genre | Theses |
---|---|
Genre | Text |
Bibliographic information
Statement of responsibility | Xiuyuan Lu. |
---|---|
Note | Submitted to the Department of Management Science & Engineering. |
Thesis | Thesis Ph.D. Stanford University 2020. |
Location | electronic resource |
Access conditions
- Copyright
- © 2020 by Xiuyuan Lu
Also listed in
Loading usage metrics...