Heuristics for large scale dynamic programming

Tripp, Charles Edison; Stanford University, Department of Electrical Engineering

Heuristics for large scale dynamic programming

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fdv260dk3358" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Dynamic programming algorithms are used to solve problems in a diverse variety of fields, including Decision Analysis, Control Systems, Communication Systems, Dynamic Pricing and Bidding, Networking, Supply Chain Management, and Planning. These applications can be challenging, with exponential complexity, so exact solutions are rarely tractable. Therefore, improvements in the efficiency and quality of approximate dynamic programming algorithms can provide significant benefit. One promising approach is on-line Reinforcement Learning, algorithms which balance exploration and exploitation on Markov decision process applications. In practice, however, these algorithms are usually applied off-line to generate a control policy which is then fixed and used on-line. This work focuses on improving off-line policy generation on both discrete and continuous state-space processes, without the distraction of exploitation. It introduces a variety of modifications to existing algorithms and replacements for those algorithms including backtracking, exact and approximate Kalman filter value modeling, variance feedback, visitation allocation, and particle filter optimization. Backtracking re-orders the update calculations to accelerate the learning process for value iteration. Exact Kalman filter value modeling develops a Bayesian dynamic linear Gaussian representation of value using basis functions. Approximate Kalman filter value modeling is a simplified version that appears to be both higher-performance and more robust than exact Kalman filter value modeling. Variance feedback is an adaptive method for updating a value model that exploits prior knowledge while avoiding overconfidence. Visitation allocation is a heuristic to better manage the exploration process. Particle filter optimization is a stochastic optimization technique that can be applied to many policy search problems, including those that are non-convex. These techniques are demonstrated on a variety of benchmark problems, including both discrete and continuous, discounted and undiscounted, and finite and infinite horizon processes.

Description

Type of resource	text
Form	electronic; electronic resource; remote
Extent	1 online resource.
Publication date	2013
Issuance	monographic
Language	English

Creators/Contributors

Associated with	Tripp, Charles Edison
Associated with	Stanford University, Department of Electrical Engineering
Primary advisor	Shachter, Ross D
Thesis advisor	Shachter, Ross D
Thesis advisor	Howard, Ronald A. (Ronald Arthur), 1934-
Thesis advisor	Van Roy, Benjamin
Advisor	Howard, Ronald A. (Ronald Arthur), 1934-
Advisor	Van Roy, Benjamin

Subjects

Genre	Theses

Bibliographic information

Statement of responsibility	Charles Edison Tripp.
Note	Submitted to the Department of Electrical Engineering.
Thesis	Thesis (Ph.D.)--Stanford University, 2013.
Location	electronic resource

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...