Heuristics for large scale dynamic programming

Placeholder Show Content

Abstract/Contents

Abstract
Dynamic programming algorithms are used to solve problems in a diverse variety of fields, including Decision Analysis, Control Systems, Communication Systems, Dynamic Pricing and Bidding, Networking, Supply Chain Management, and Planning. These applications can be challenging, with exponential complexity, so exact solutions are rarely tractable. Therefore, improvements in the efficiency and quality of approximate dynamic programming algorithms can provide significant benefit. One promising approach is on-line Reinforcement Learning, algorithms which balance exploration and exploitation on Markov decision process applications. In practice, however, these algorithms are usually applied off-line to generate a control policy which is then fixed and used on-line. This work focuses on improving off-line policy generation on both discrete and continuous state-space processes, without the distraction of exploitation. It introduces a variety of modifications to existing algorithms and replacements for those algorithms including backtracking, exact and approximate Kalman filter value modeling, variance feedback, visitation allocation, and particle filter optimization. Backtracking re-orders the update calculations to accelerate the learning process for value iteration. Exact Kalman filter value modeling develops a Bayesian dynamic linear Gaussian representation of value using basis functions. Approximate Kalman filter value modeling is a simplified version that appears to be both higher-performance and more robust than exact Kalman filter value modeling. Variance feedback is an adaptive method for updating a value model that exploits prior knowledge while avoiding overconfidence. Visitation allocation is a heuristic to better manage the exploration process. Particle filter optimization is a stochastic optimization technique that can be applied to many policy search problems, including those that are non-convex. These techniques are demonstrated on a variety of benchmark problems, including both discrete and continuous, discounted and undiscounted, and finite and infinite horizon processes.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2013
Issuance monographic
Language English

Creators/Contributors

Associated with Tripp, Charles Edison
Associated with Stanford University, Department of Electrical Engineering
Primary advisor Shachter, Ross D
Thesis advisor Shachter, Ross D
Thesis advisor Howard, Ronald A. (Ronald Arthur), 1934-
Thesis advisor Van Roy, Benjamin
Advisor Howard, Ronald A. (Ronald Arthur), 1934-
Advisor Van Roy, Benjamin

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Charles Edison Tripp.
Note Submitted to the Department of Electrical Engineering.
Thesis Thesis (Ph.D.)--Stanford University, 2013.
Location electronic resource

Access conditions

Copyright
© 2013 by Charles Edison Tripp
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...