Design and implementation of stochastic control policies via convex optimization

Wang, Yang; Stanford University, Department of Electrical Engineering

Design and implementation of stochastic control policies via convex optimization

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fkn566xz4377" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: In this dissertation we consider the design and implementation of control policies for stochastic control problems with arbitrary dynamics, objective, and constraints. In some very special cases, these problems can be solved analytically. For instance, when the dynamics are linear, and the objective is quadratic, the optimal control policy is linear state feedback. Another simple case where the optimal policy can be computed exactly is when the state and action spaces are finite, in which case methods such as value iteration or policy iteration can be used. When the state and action spaces are infinite, but low dimensional, the optimal control problem can be solved by gridding or other discretization methods. In general however, the optimal control policy cannot be tractably computed. In such situations, there are many methods for finding suboptimal controllers that hopefully achieve a small objective value. One particular method we will discuss in detail is approximate dynamic programming (ADP), which relies on an expression for the optimal policy in terms of the value function of the stochastic control problem. In ADP we use the same expression as the optimal policy, but replace the true value function with an approximation. Another widely-used suboptimal policy is receding horizon control (RHC), also known as model predictive control (MPC). In MPC, we solve an optimization problem at each time step to determine a plan of action over a fixed time horizon, and then apply the first input from the plan. At the next time step we repeat the planning process, solving a new optimization problem, with the time horizon shifted one step forward. In the design of policies such as these, we must choose parameters, such as approximate value functions, terminal costs, time horizon, to achieve good performance. Ideally, we would like to be able to compare the performance of a suboptimal controller with the optimal performance, which we cannot compute. In this dissertation, we describe a general method for obtaining a function that lower bounds the value function of a stochastic control problem. Our method yields a numerical lower bound on the optimal objective value, as well as a value function underestimator that can be used as a parameter for ADP or MPC. We can then compare our bound to the performance achieved by our policies. If the gap between the two is small, we can conclude that the policies are nearly optimal, and the bound is nearly tight. Thus, our method simultaneously yields suboptimal policy designs, as well as a way to certify their performance. Our underestimator/bound is non-generic, in the sense that it does not simply depend on problem dimensions and some basic assumptions about the problem data. Instead, they are computed (numerically) for each specific problem instance. We will see that for many problem families, our method is based on solving a convex optimization problem, thus avoiding the `curses of dimensionality' usually associated with dynamic programming. One drawback of the suboptimal policies we design is that an optimization problem must be solved at each time step to determine the input to apply to the system. Using conventional optimization solvers this can take seconds, if not minutes. Thus, applications of these policies have been traditionally limited to systems with relatively slow dynamics, with sampling times measured in seconds, minutes, or hours. In the second part of this dissertation, we outline a collection of optimization methods that exploit the particular structure of the control problem. Our custom methods are up to around 1000 times faster compared with generic optimization packages such as SDPT3 or SeDuMi. These advances, combined with ever-increasing computing power, extends the application of optimization based policies to a wide range of applications, including those with millisecond or microsecond sampling periods.

Description

Type of resource	text
Form	electronic; electronic resource; remote
Extent	1 online resource.
Publication date	2012
Issuance	monographic
Language	English

Creators/Contributors

Associated with	Wang, Yang
Associated with	Stanford University, Department of Electrical Engineering
Primary advisor	Boyd, Stephen P
Thesis advisor	Boyd, Stephen P
Thesis advisor	Lall, Sanjay
Thesis advisor	Van Roy, Benjamin
Advisor	Lall, Sanjay
Advisor	Van Roy, Benjamin

Subjects

Genre	Theses

Bibliographic information

Statement of responsibility	Yang Wang.
Note	Submitted to the Department of Electrical Engineering.
Thesis	Thesis (Ph.D.)--Stanford University, 2012.
Location	electronic resource

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...