Information-directed sampling for reinforcement learning

Lu, Xiuyuan

Information-directed sampling for reinforcement learning

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fmx606hx2868" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Reinforcement learning has enjoyed a resurgence in popularity over the past decade thanks to the ever-increasing availability of computing power. Many success stories of reinforcement learning seem to suggest a potential gateway to creating intelligent agents that are capable of performing tasks with human-level proficiency. However, many state-of-the-art reinforcement learning algorithms require a tremendous amount of simulated data, which is not practical when data is generated from actual interactions in the real world. Addressing data efficiency will be crucial for making reinforcement learning practical for real-world applications. In this dissertation, we take an information-theoretic approach to reason about how an agent should acquire information in an environment to improve decision-making. We generalize the information-directed sampling (IDS) decision rule from online decision-making literature to reinforcement learning. This decision rule aims to acquire useful information about the environment while also taking into consideration the costs of information acquisition. We argue that IDS can demonstrate desirable information-seeking behaviors in a reinforcement learning problem where existing methods fail. We hypothesize that in practical environments that are typically rich in observations, IDS has the potential to significantly improve data efficiency relative to existing exploration schemes. Furthermore, we analyze the expected regret of IDS for three stylized classes of environments, linear bandits, tabular Markov decision processes (MDPs), and factored MDPs. We derive regret bounds that are nearly competitive with state-of-the-art regret bounds, which demonstrate promise of our information-theoretic design concept. Lastly, the form of IDS studied in this dissertation should be viewed as an agent design concept rather than a concrete algorithm. Major work needs to be done to design practical algorithms that preserve the benefits of this conceptual decision rule while being computationally tractable. We highlight some key aspects for designing a practical IDS agent and propose several research directions for addressing each aspect.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2020; ©2020
Publication date	2020; 2020
Issuance	monographic
Language	English

Creators/Contributors

Author	Lu, Xiuyuan
Degree supervisor	Van Roy, Benjamin
Thesis advisor	Van Roy, Benjamin
Thesis advisor	Brunskill, Emma
Thesis advisor	Johari, Ramesh, 1976-
Degree committee member	Brunskill, Emma
Degree committee member	Johari, Ramesh, 1976-
Associated with	Stanford University, Department of Management Science & Engineering

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Xiuyuan Lu.
Note	Submitted to the Department of Management Science & Engineering.
Thesis	Thesis Ph.D. Stanford University 2020.
Location	electronic resource

Access conditions

Also listed in

View in SearchWorks

Loading usage metrics...