Monte Carlo planning and reinforcement learning for large scale sequential decision problems

Placeholder Show Content

Abstract/Contents

Abstract
Autonomous agents have the potential to do tasks that would otherwise be too repetitive, difficult, or dangerous for humans. Solving many of these problems requires reasoning over sequences of decisions in order to reach a goal. Autonomous driving, inventory management, and medical diagnosis and treatment are all examples of important real-world sequential decision problems. Approximate solution methods such as reinforcement learning and Monte Carlo planning have achieved superhuman performance in some domains. In these methods, agents learn good actions to take in response to inputs. Problems with many widely varying inputs or possible actions remain challenging to efficiently solve without extensive problem-specific engineering. One of the key challenges in solving sequential decision problems is efficiently exploring the many different paths an agent may take. For most problems, it is infeasible to test every possible path. Many existing approaches explore paths using simple random sampling. Problems in which many different actions may be taken at each step often require more efficient exploration to be solved. Large, unstructured input spaces can also challenge conventional learning approaches. Agents must learn to recognize inputs that are functionally similar while simultaneously learning an effective decision strategy. As a result of these challenges, learning agents are often limited to solving tasks in virtual domains where very large amounts of trials can be conducted relatively safely and cheaply. When problems are solved using black-box models such as neural networks, the resulting decision making policy is impossible for a human to meaningfully interpret. This can also limit the use of learning agents to low-regret tasks such as image classification or video game playing. The work in this thesis addresses the challenges of learning in large-space sequential decision problems. The thesis first considers methods to improve scaling of deep reinforcement learning and Monte Carlo tree search methods. We present neural network architectures for the common case of exchangeable object inputs in deep reinforcement learning. The presented architecture accelerates learning by efficiently sharing learned representations among objects of the same type. The thesis then addresses methods to efficiently explore large action spaces in Monte Carlo tree search. We present two algorithms, PA-POMCPOW and BOMCP, that improve search by guiding exploration to actions with good expected performance or information gain. We then propose methods to improve the use of offline learned policies within online Monte Carlo planning through importance sampling and experience generalization. Finally, we study methods to interpret learned policies and expected search performance. Here, we present a method to represent high-dimensional policies with interpretable local surrogate trees. We also propose bounds on the error rates for Monte Carlo estimation that can be numerically calculated using empirical quantities.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2021; ©2021
Publication date 2021; 2021
Issuance monographic
Language English

Creators/Contributors

Author Mern, John Michael
Degree supervisor Kochenderfer, Mykel J, 1980-
Thesis advisor Kochenderfer, Mykel J, 1980-
Thesis advisor Mukerji, Tapan, 1965-
Thesis advisor Schwager, Mac
Degree committee member Mukerji, Tapan, 1965-
Degree committee member Schwager, Mac
Associated with Stanford University, Department of Aeronautics and Astronautics

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility John Mern.
Note Submitted to the Department of Aeronautics and Astronautics.
Thesis Thesis Ph.D. Stanford University 2021.
Location https://purl.stanford.edu/rh431py7651

Access conditions

Copyright
© 2021 by John Michael Mern
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...