Control, filtering, learning, and multi-robot algorithms for large graph-based Markov decision processes

Haksar, Ravi

Control, filtering, learning, and multi-robot algorithms for large graph-based Markov decision processes

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fhr193tr6765" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: This thesis derives a series of algorithms to enable the use of a class of structured models, known as graph-based Markov decision processes (GMDPs), for applications involving a collection of interacting processes. Many large-scale dynamic processes of recent interest are described by GMDPs, including disease epidemics, forest wildfires, robot swarms, and social networks. For the discrete time and discrete space graph-based models we consider in this thesis, each vertex in the graph corresponds to a standard Markov decision process (MDP) and edges between vertices correspond to the coupling interactions which influence the state transitions of the MDP. For example, in a forest wildfire, each vertex corresponds to a tree and edges define the surrounding trees which influence the probability of a tree catching on fire. Similarly, in a disease epidemic, each vertex refers to a community and edges describe the surrounding communities which influence the probability of an outbreak occurring based on their infection level. In addition, we consider a property common in large-scale processes called ``Anonymous Influence.'' Simply stated, a GMDP contains Anonymous Influence if the state transitions of the individual MDPs relies on the number of influencing MDPs in particular states, and not the identity of these MDPs. A consequence of using GMDPs is that the equivalent standard MDP representation typically has a significantly large state space, observation space (for the filtering problem), and feasible action space (for the optimal control problem). Therefore, a major objective of this thesis is to derive suitable algorithms that are efficient and scalable for arbitrarily large models. Furthermore, we address the necessary aspects of using GMDPs as a modeling framework: learning model parameters from time series data, optimally allocating a limited amount of control resources, producing state estimates online given uncertain measurements, and interacting with a process modeled by a GMDP using a team of cooperative autonomous agents. We begin by considering the problem of optimally allocating a limited amount of control resources when the underlying model state is fully observable. We leverage approximate dynamic programming methods based on linear programs to circumvent enumerating the state and action spaces, which allows us to build a scalable framework for producing approximate value and state-action function approximations. From these functions, a constrained policy can be derived which strictly enforces a capacity constraint. We also propose a novel control framework based on bond percolation on a lattice to more easily address potentially heterogeneous properties in GMDP models. For both of these approaches, we develop analysis techniques to evaluate the quality of our approximations and to determine if the process is controllable. We evaluate our control techniques on simulations of forest wildfires and disease epidemics, and show that they are effective on considerably large models and outperform comparable methods. Next, we relax our assumption of a fully observable state and consider the problem of producing state estimates online given only noisy measurements. Here, we leverage techniques from variational inference to derive an approximately optimal message passing scheme which is similar in spirit to belief propagation methods, and prove that our approach optimizes the evidence lower bound. We show that our filtering scheme is at least as accurate as other methods while requiring two orders of magnitude less time to produce an estimate. We further validate the need for a fast and accurate online filter by developing a certainty-equivalence framework to enable our control methods in applications with state uncertainty. We show that this framework is able to achieve comparable results to the case of a fully observable state with only a minimal increase in control effort. We then turn to the problem of learning the necessary parameters of a GMDP using time series data, and relax the assumption that the model parameters are specified a priori. We use the Expectation-Maximization framework to derive a method that approximately optimizes the likelihood of the data with favorable complexity for large GMDPs. We use publicly available data, on the daily counts of Novel Coronavirus (COVID-19) in California by county and twitter interactions on a topic, to train GMDP models and show that they better explain the observed data compared to a completely independent process model assumption. Lastly, we develop three agent-based frameworks based on a team of cooperative autonomous aerial vehicles which we apply to the example of forest wildfires. We relax various assumptions that we used in our control and state estimation methods. In the control problem, we no longer assume that control actions can be applied arbitrarily in the forest at each time step. Instead, agents must consider travel time and a limited view of the forest before deciding how to move to quickly extinguish a wildfire as a team. In the filtering problem, we introduce severe partial observability and communication constraints, and agents must coordinate meetings to share information and enable effective coverage of a wildfire. We also pose a general multi-agent cooperative optimization problem and discuss distributed techniques and conditions required to recover the optimal centralized solution. We use this class of optimization problems to motivate a more general approach than prior work to multi-agent coordination strategies. All of our proposed algorithms have the feature of addressing efficiency and scalability to enable the use of potentially very large GMDPs. Our methods will enable GMDPs to be used for arbitrarily complex applications while still allowing for theoretical analysis and justification. Furthermore, while there are still avenues to investigate, our methods provide the groundwork for continuing to relax assumptions and address more general modeling cases. We believe our methods will spur further interest in leveraging GMDPs to address natural phenomena.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2020; ©2020
Publication date	2020; 2020
Issuance	monographic
Language	English

Creators/Contributors

Author	Haksar, Ravi
Degree supervisor	Okamura, Allison
Degree supervisor	Schwager, Mac
Thesis advisor	Okamura, Allison
Thesis advisor	Schwager, Mac
Thesis advisor	Kochenderfer, Mykel J, 1980-
Thesis advisor	Van Roy, Benjamin
Degree committee member	Kochenderfer, Mykel J, 1980-
Degree committee member	Van Roy, Benjamin
Associated with	Stanford University, Department of Mechanical Engineering

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Ravi N. Haksar.
Note	Submitted to the Department of Mechanical Engineering.
Thesis	Thesis Ph.D. Stanford University 2020.
Location	electronic resource

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...