Efficient reinforcement learning with value function generalization

Wen, Zheng; Stanford University, Department of Electrical Engineering.

Efficient reinforcement learning with value function generalization

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fgq839ss5306" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Reinforcement learning (RL) is concerned with how an agent should learn to make decisions over time while interacting with an environment. A growing body of work has produced RL algorithms with sample and computational efficiency guarantees. However, most of this work focuses on "tabula rasa" learning; i.e. algorithms aim to learn with little or no prior knowledge about the environment. Such algorithms exhibit sample complexities that grow at least linearly in the number of states, and they are of limited practical import since state spaces in most relevant contexts are enormous. There is a need for algorithms that generalize in order to learn how to make effective decisions at states beyond the scope of past experience. This dissertation focuses on the open issue of developing efficient RL algorithms that leverage value function generalization (VFG). It consists of two parts. In the first part, we present sample complexity results for two classes of RL problems -- deterministic systems with general forms of VFG and Markov decision processes (MDPs) with a finite hypothesis class. The results provide upper bounds that are independent of state and action space cardinalities and polynomial in other problem parameters. In the second part, building on insights from our sample complexity analyses, we propose randomized least-square value iteration (RLSVI), a RL algorithm for MDPs with VFG via linear hypothesis classes. The algorithm is based on a new notion of randomized value function exploration. We compare through computational studies the performance of RLSVI against least-square value iterations (LSVI) with Boltzmann exploration or epsilon-greedy exploration, which are widely used in RL with VFG. Results demonstrate that RLSVI is orders of magnitude more efficient.

Description

Type of resource	text
Form	electronic; electronic resource; remote
Extent	1 online resource.
Publication date	2014
Issuance	monographic
Language	English

Creators/Contributors

Associated with	Wen, Zheng
Associated with	Stanford University, Department of Electrical Engineering.
Primary advisor	Van Roy, Benjamin
Thesis advisor	Van Roy, Benjamin
Thesis advisor	Boyd, Stephen P
Thesis advisor	Johari, Ramesh, 1976-
Thesis advisor	O'Neill, Daniel
Advisor	Boyd, Stephen P
Advisor	Johari, Ramesh, 1976-
Advisor	O'Neill, Daniel

Subjects

Genre	Theses

Bibliographic information

Statement of responsibility	Zheng Wen.
Note	Submitted to the Department of Electrical Engineering.
Thesis	Ph.D. Stanford University 2014
Location	electronic resource

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...