Risk-sensitive and data-driven sequential decision making

Chow, Yinlam; Stanford University, Institute for Computational and Mathematical Engineering.

Risk-sensitive and data-driven sequential decision making

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fvt907yz3004" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Markov decision processes (MDPs) provide a mathematical framework for modeling sequential decision making where system evolution and cost/reward depend on uncertainties and control actions of a decision. MDP models have been widely adopted in numerous domains such as robotics, control systems, finance, economics, and manufacturing. At the same time, optimization theories of MDPs serve as the theoretical underpinnings to numerous dynamic programming and reinforcement learning algorithms in stochastic control problems. While the study in MDPs is attractive for several reasons, there are two main challenges associated with its practicality: 1) An accurate MDP model is oftentimes not available to the decision maker. Affected by modeling errors, the resultant MDP solution policy is non-robust to system fluctuations. 2) The most widely-adopted optimization criterion for MDPs is represented by the risk-neutral expectation of a cumulative cost. This does not take into account the notion of risk, i.e., increased awareness of events of small probability but high consequences. In this thesis we study multiple important aspects in risk-sensitive sequential decision making where the variability of stochastic costs and robustness to modeling errors are taken into account. First, we address a special type of risk-sensitive decision making problems where the percentile behaviors are considered. Here risk is either modeled by the conditional value-at-risk (CVaR) or the Value-at-risk (VaR). VaR measures risk as the maximum cost that might be incurred with respect to a given confidence level, and is appealing due to its intuitive meaning and its connection to chance-constraints. The VaR risk measure has many fundamental engineering applications such as motion planning, where a safety constraint is imposed to upper bound the probability of maneuvering into dangerous regimes. Despite its popularity, VaR suffers from being unstable, and its singularity often introduces mathematical issues to optimization problems. To alleviate this problem, an alternative measure that addresses most of VaR's shortcomings is CVaR. CVaR is a risk-measure that is rapidly gaining popularity in various financial applications, due to its favorable computational properties (i.e., CVaR is a coherent risk) and superior ability to safeguard a decision maker from the "outcomes that hurt the most". As a risk that measures the conditional expected cost given that such cost is greater than or equal to VaR, CVaR accounts for the total cost of undesirable events (it corresponds to events whose associated probability is low, but the corresponding cost is high) and is therefore preferable in financial application such as portfolio optimization. Second, we consider optimization problems in which the objective function involves a coherent risk measure of the random cost. Here the term coherent risk denotes a general class of risks that satisfies convexity, monotonicity, translational-invariance and positive homogeneity. These properties not only guarantee that the optimization problems are mathematically well-posed, but they are also axiomatically justified. Therefore modeling risk-aversion with coherent risks has already gained widespread acceptance in engineer- ing, finance and operations research applications, among others. On the other hand, when the optimization problem is sequential, another important property of a risk measure is time consistency. A time consistent risk metric satisfies the "dynamic-programming" style property which ensures rational decision making, i.e., the strategy that is risk-optimal at the current stage will also be deemed optimal in subsequent stages. To get the best of both worlds, the recently proposed Markov risk measures satisfy both the coherent risk properties and time consistency. Thus to ensure rationality in risk modeling and algorithmic tractability, this thesis will focus on risk-sensitive sequential decision making problems modeled by Markov risk measures.

Description

Type of resource	text
Form	electronic; electronic resource; remote
Extent	1 online resource.
Publication date	2017
Issuance	monographic
Language	English

Creators/Contributors

Associated with	Chow, Yinlam
Associated with	Stanford University, Institute for Computational and Mathematical Engineering.
Primary advisor	Pavone, Marco, 1980-
Thesis advisor	Pavone, Marco, 1980-
Thesis advisor	Ghavamzadeh, Mohammad, 1972-
Thesis advisor	Johari, Ramesh, 1976-
Advisor	Ghavamzadeh, Mohammad, 1972-
Advisor	Johari, Ramesh, 1976-

Subjects

Genre	Theses

Bibliographic information

Statement of responsibility	Yinlam Chow.
Note	Submitted to the Institute for Computational and Mathematical Engineering.
Thesis	Thesis (Ph.D.)--Stanford University, 2017.
Location	electronic resource

Access conditions

Also listed in

View in SearchWorks

Loading usage metrics...