Heuristics for large scale dynamic programming
Abstract/Contents
- Abstract
- Dynamic programming algorithms are used to solve problems in a diverse variety of fields, including Decision Analysis, Control Systems, Communication Systems, Dynamic Pricing and Bidding, Networking, Supply Chain Management, and Planning. These applications can be challenging, with exponential complexity, so exact solutions are rarely tractable. Therefore, improvements in the efficiency and quality of approximate dynamic programming algorithms can provide significant benefit. One promising approach is on-line Reinforcement Learning, algorithms which balance exploration and exploitation on Markov decision process applications. In practice, however, these algorithms are usually applied off-line to generate a control policy which is then fixed and used on-line. This work focuses on improving off-line policy generation on both discrete and continuous state-space processes, without the distraction of exploitation. It introduces a variety of modifications to existing algorithms and replacements for those algorithms including backtracking, exact and approximate Kalman filter value modeling, variance feedback, visitation allocation, and particle filter optimization. Backtracking re-orders the update calculations to accelerate the learning process for value iteration. Exact Kalman filter value modeling develops a Bayesian dynamic linear Gaussian representation of value using basis functions. Approximate Kalman filter value modeling is a simplified version that appears to be both higher-performance and more robust than exact Kalman filter value modeling. Variance feedback is an adaptive method for updating a value model that exploits prior knowledge while avoiding overconfidence. Visitation allocation is a heuristic to better manage the exploration process. Particle filter optimization is a stochastic optimization technique that can be applied to many policy search problems, including those that are non-convex. These techniques are demonstrated on a variety of benchmark problems, including both discrete and continuous, discounted and undiscounted, and finite and infinite horizon processes.
Description
Type of resource | text |
---|---|
Form | electronic; electronic resource; remote |
Extent | 1 online resource. |
Publication date | 2013 |
Issuance | monographic |
Language | English |
Creators/Contributors
Associated with | Tripp, Charles Edison |
---|---|
Associated with | Stanford University, Department of Electrical Engineering |
Primary advisor | Shachter, Ross D |
Thesis advisor | Shachter, Ross D |
Thesis advisor | Howard, Ronald A. (Ronald Arthur), 1934- |
Thesis advisor | Van Roy, Benjamin |
Advisor | Howard, Ronald A. (Ronald Arthur), 1934- |
Advisor | Van Roy, Benjamin |
Subjects
Genre | Theses |
---|
Bibliographic information
Statement of responsibility | Charles Edison Tripp. |
---|---|
Note | Submitted to the Department of Electrical Engineering. |
Thesis | Thesis (Ph.D.)--Stanford University, 2013. |
Location | electronic resource |
Access conditions
- Copyright
- © 2013 by Charles Edison Tripp
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...