Posterior sampling for efficient reinforcement learning

Dwaracherla, Vikranth Reddy

Posterior sampling for efficient reinforcement learning

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fjb495qn9584" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Reinforcement learning has shown tremendous success over the past few years. Much of this recent success can be attributed to agents learning from an inordinate amount of data in simulated environments. In order to achieve similar success in real environments, it is crucial to address data efficiency. Uncertainty quantification plays a prominent role in designing an intelligent agent which exhibits data efficiency. An agent which has a notion of uncertainty can trade-off between exploration and exploitation and explore in an intelligent manner. Such an agent should not only consider immediate information gain from an action but also its consequences on future learning prospects. An agent which has this capability is said to exhibit deep exploration. Algorithms that tackle deep exploration, so far, have relied on epistemic uncertainty representation through ensembles or other hypermodels, exploration bonuses, or visitation count distributions. An open question is whether deep exploration can be achieved by an incremental reinforcement learning algorithm that tracks a single point estimate, without additional complexity required to account for epistemic uncertainty. We answer this question in the affirmative. In this dissertation, we develop Langevin DQN, a variation of DQN that differs only in perturbing parameter updates with Gaussian noise, and demonstrate through a computational study that Langevin DQN achieves deep exploration. This is the first algorithm that demonstratively achieves deep exploration using a single-point estimate. We also present index sampling, a novel method for efficiently generating approximate samples from a posterior over complex models such as neural networks, induced by a prior distribution over the model family and a set of input-output data pairs. In addition, we develop posterior sampling networks, a new approach to model this distribution over models. We are particularly motivated by the application of our method to tackle reinforcement learning problems, but it could be of independent interest to the Bayesian deep learning community. Our method is especially useful in RL when we use complex exploration schemes, which make use of more than a single sample from the posterior, such as information directed sampling. Finally, we present some preliminary results demonstrating that the Langevin DQN update rule could be used to train posterior sampling networks, as an alternative to index sampling, and further improve data efficiency.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2021; ©2021
Publication date	2021; 2021
Issuance	monographic
Language	English

Creators/Contributors

Author	Dwaracherla, Vikranth Reddy
Degree supervisor	Van Roy, Benjamin
Thesis advisor	Van Roy, Benjamin
Thesis advisor	Brunskill, Emma
Thesis advisor	Pilanci, Mert
Degree committee member	Brunskill, Emma
Degree committee member	Pilanci, Mert
Associated with	Stanford University, Department of Electrical Engineering

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Vikranth Dwaracherla.
Note	Submitted to the Department of Electrical Engineering.
Thesis	Thesis Ph.D. Stanford University 2021.
Location	https://purl.stanford.edu/jb495qn9584

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...