Convex methods for approximate dynamic programming

Keshavarz, Arezou; Stanford University, Department of Electrical Engineering

Convex methods for approximate dynamic programming

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fpc255hj4342" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: his thesis studies the use of convex optimization in stochastic control problems. We propose methods based on convex optimization for approximate dynamic programming. Dynamic programming is a standard approach to many stochastic control problems, which involves decomposing the problem into a sequence of subproblems to solve for a global minimizer, called the value function. Traditional dynamic programming approaches focus on problems with finite state space and action space, and finding a solution gets prohibitively difficult as the state space or action space grows. With the exception of a few special cases, dynamic programming (DP) is difficult to carry out for general problems. In such cases, approximate dynamic programming (ADP) gives a method for finding a good, if not optimal, policy. In this work, we rely on our ability to (numerically) solve convex optimization problems with great speed and reliability. Using custom generated solvers we can speed up computation by orders of magnitude. In this thesis, we use convex optimization in conjunction with approximate dynamic programming to find the optimal (or suboptimal, but good) policies for an array of stochastic control problems. In the first part of the thesis, we consider applications where we have access to a good controller (which may be very complex): this could be a pilot controlling an aircraft, or a model predictive controller with a long lookahead horizon. In such cases, we can observe the actions chosen in different states, but we do not have access to the underlying objective function used by the controller (such as the pilot). We propose methods that use convex optimization to impute the underlying objective function. This results in a controller that can mimic the behavior of the original controller, often with the complexity reduced greatly. In the second part of the thesis, we develop the mathematical framework and present algorithms for carrying out projected value iteration using quadratic approximate value functions. Although there are no theoretical guarantees, we observe that in practice we achieve very good performance in a reasonable number of steps. We will consider problems in a variety of applications, such as consumer behavior modeling, robotics, finance, input-constrained control, and supply chain management.

Description

Type of resource	text
Form	electronic; electronic resource; remote
Extent	1 online resource.
Publication date	2012
Issuance	monographic
Language	English

Creators/Contributors

Associated with	Keshavarz, Arezou
Associated with	Stanford University, Department of Electrical Engineering
Primary advisor	Boyd, Stephen P
Thesis advisor	Boyd, Stephen P
Thesis advisor	Judd, Kenneth L
Thesis advisor	Van Roy, Benjamin
Advisor	Judd, Kenneth L
Advisor	Van Roy, Benjamin

Subjects

Genre	Theses

Bibliographic information

Statement of responsibility	Arezou Keshavarz.
Note	Submitted to the Department of Electrical Engineering.
Thesis	Thesis (Ph.D.)--Stanford University, 2012.
Location	electronic resource

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...