Monte Carlo planning and reinforcement learning for large scale sequential decision problems

Mern, John Michael

Monte Carlo planning and reinforcement learning for large scale sequential decision problems

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Frh431py7651" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Autonomous agents have the potential to do tasks that would otherwise be too repetitive, difficult, or dangerous for humans. Solving many of these problems requires reasoning over sequences of decisions in order to reach a goal. Autonomous driving, inventory management, and medical diagnosis and treatment are all examples of important real-world sequential decision problems. Approximate solution methods such as reinforcement learning and Monte Carlo planning have achieved superhuman performance in some domains. In these methods, agents learn good actions to take in response to inputs. Problems with many widely varying inputs or possible actions remain challenging to efficiently solve without extensive problem-specific engineering. One of the key challenges in solving sequential decision problems is efficiently exploring the many different paths an agent may take. For most problems, it is infeasible to test every possible path. Many existing approaches explore paths using simple random sampling. Problems in which many different actions may be taken at each step often require more efficient exploration to be solved. Large, unstructured input spaces can also challenge conventional learning approaches. Agents must learn to recognize inputs that are functionally similar while simultaneously learning an effective decision strategy. As a result of these challenges, learning agents are often limited to solving tasks in virtual domains where very large amounts of trials can be conducted relatively safely and cheaply. When problems are solved using black-box models such as neural networks, the resulting decision making policy is impossible for a human to meaningfully interpret. This can also limit the use of learning agents to low-regret tasks such as image classification or video game playing. The work in this thesis addresses the challenges of learning in large-space sequential decision problems. The thesis first considers methods to improve scaling of deep reinforcement learning and Monte Carlo tree search methods. We present neural network architectures for the common case of exchangeable object inputs in deep reinforcement learning. The presented architecture accelerates learning by efficiently sharing learned representations among objects of the same type. The thesis then addresses methods to efficiently explore large action spaces in Monte Carlo tree search. We present two algorithms, PA-POMCPOW and BOMCP, that improve search by guiding exploration to actions with good expected performance or information gain. We then propose methods to improve the use of offline learned policies within online Monte Carlo planning through importance sampling and experience generalization. Finally, we study methods to interpret learned policies and expected search performance. Here, we present a method to represent high-dimensional policies with interpretable local surrogate trees. We also propose bounds on the error rates for Monte Carlo estimation that can be numerically calculated using empirical quantities.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2021; ©2021
Publication date	2021; 2021
Issuance	monographic
Language	English

Creators/Contributors

Author	Mern, John Michael
Degree supervisor	Kochenderfer, Mykel J, 1980-
Thesis advisor	Kochenderfer, Mykel J, 1980-
Thesis advisor	Mukerji, Tapan, 1965-
Thesis advisor	Schwager, Mac
Degree committee member	Mukerji, Tapan, 1965-
Degree committee member	Schwager, Mac
Associated with	Stanford University, Department of Aeronautics and Astronautics

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	John Mern.
Note	Submitted to the Department of Aeronautics and Astronautics.
Thesis	Thesis Ph.D. Stanford University 2021.
Location	https://purl.stanford.edu/rh431py7651

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...