Information-directed sampling for reinforcement learning

Placeholder Show Content

Abstract/Contents

Abstract
Reinforcement learning has enjoyed a resurgence in popularity over the past decade thanks to the ever-increasing availability of computing power. Many success stories of reinforcement learning seem to suggest a potential gateway to creating intelligent agents that are capable of performing tasks with human-level proficiency. However, many state-of-the-art reinforcement learning algorithms require a tremendous amount of simulated data, which is not practical when data is generated from actual interactions in the real world. Addressing data efficiency will be crucial for making reinforcement learning practical for real-world applications. In this dissertation, we take an information-theoretic approach to reason about how an agent should acquire information in an environment to improve decision-making. We generalize the information-directed sampling (IDS) decision rule from online decision-making literature to reinforcement learning. This decision rule aims to acquire useful information about the environment while also taking into consideration the costs of information acquisition. We argue that IDS can demonstrate desirable information-seeking behaviors in a reinforcement learning problem where existing methods fail. We hypothesize that in practical environments that are typically rich in observations, IDS has the potential to significantly improve data efficiency relative to existing exploration schemes. Furthermore, we analyze the expected regret of IDS for three stylized classes of environments, linear bandits, tabular Markov decision processes (MDPs), and factored MDPs. We derive regret bounds that are nearly competitive with state-of-the-art regret bounds, which demonstrate promise of our information-theoretic design concept. Lastly, the form of IDS studied in this dissertation should be viewed as an agent design concept rather than a concrete algorithm. Major work needs to be done to design practical algorithms that preserve the benefits of this conceptual decision rule while being computationally tractable. We highlight some key aspects for designing a practical IDS agent and propose several research directions for addressing each aspect.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2020; ©2020
Publication date 2020; 2020
Issuance monographic
Language English

Creators/Contributors

Author Lu, Xiuyuan
Degree supervisor Van Roy, Benjamin
Thesis advisor Van Roy, Benjamin
Thesis advisor Brunskill, Emma
Thesis advisor Johari, Ramesh, 1976-
Degree committee member Brunskill, Emma
Degree committee member Johari, Ramesh, 1976-
Associated with Stanford University, Department of Management Science & Engineering

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Xiuyuan Lu.
Note Submitted to the Department of Management Science & Engineering.
Thesis Thesis Ph.D. Stanford University 2020.
Location electronic resource

Access conditions

Copyright
© 2020 by Xiuyuan Lu

Also listed in

Loading usage metrics...