Approximate methods for validating autonomous systems in simulation

Koren, Mark

Approximate methods for validating autonomous systems in simulation

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fpv383pd8838" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Because of the safety-critical nature of many autonomous systems, validation is essential before deployment. However, validation is difficult—most of these systems act in high-dimensional spaces that make formal methods intractable, and failures are too rare to rely on physical testing. Instead, systems must be validated approximately in simulation. How to perform this validation tractably while still ensuring safety is an open problem. One approach to validation is adaptive stress testing (AST), where finding the most-likely failure in simulation is formulated as a Markov decision process (MDP). Reinforcement learning techniques can then be used to validate a system through falsification. We are interested in validating agents that act in large, continuous, and complex spaces. Consequently, it is almost always the case that forcing a failure is possible. Optimizing to find the most likely failure improves the relevance of the failures uncovered, and provides valuable information to designers. This thesis presents two new techniques for solving the MDP to find failures: 1) a deep reinforcement learning (DRL) based approach and 2) a go-explore (GE) based approach. Scalability is key to efficiently validating an autonomous agent, for which large, continuous state and action spaces lead to a dimensional explosion in possible scenario rollouts. This problem is exacerbated by the fact that designers are often interested in a space of similar test scenarios starting from slightly different initial conditions. Running a validation method many times from different initial conditions could quickly become intractable. DRL has been shown to perform better than traditional reinforcement learning techniques, such as Monte Carlo tree search (MCTS) on problems with continuous state spaces. In addition to scalability advantages, DRL can use recurrent networks to explicitly capture the sequential structure of a policy. This thesis presents a DRL reinforcement learner for AST based on recurrent neural networks (RNNs). By using an RNN, the reinforcement learner learns a policy that generalizes across initial conditions, while also providing the scalability advantages of deep learning. While DRL techniques scale well, they also rely on the existence of a constant reward signal to guide the agent towards better solutions during training. For validation, domain experts can sometimes provide a heuristic that will guide the reinforcement learner towards failures. However, without such a heuristic, the problem becomes a hard-exploration problem. GE has shown state-of-the-art results on traditional hard-exploration benchmarks such as Montezuma's Revenge. This thesis uses the tree search phase of go-explore to find failures without heuristics in domains where DRL and MCTS do not find failures. In addition, this thesis shows that phase 2 of go-explore, the backwards algorithm, can often be used to improve the likelihood of failures found by any reinforcement learning method, with or without heuristics. Autonomous vehicles are an example of an autonomous system that acts in a large, continuous state space. In addition, failures are rare events for autonomous vehicles, with some experts proposing that they will not be safe enough until they crash only once every 1.0 × 10 9 miles. Consequently, validating the safety of autonomous systems generally requires the use of high-fidelity simulators that adequately capture the variability of real-world scenarios. However, it is generally not feasible to exhaustively search the space of simulation scenarios for failures. Adaptive stress testing (AST) is a method that uses reinforcement learning to find the most likely failure of a system. This thesis presents a way of using low-fidelity simulation rollouts—generally much cheaper and faster to generate—to reduce the number of high-fidelity simulation rollouts needed to find failures, which allows us to validate autonomous vehicles at scale in high-fidelty simulators. As autonomous systems become more prevalent, and their development more wide-spread and distributed, validation techniques likewise must become widely available. Towards that end, the final contribution of this thesis is the AST Toolbox, an open-source Python package for applying AST to any autonomous system. The Toolbox contains pre-implemented MCTS, DRL, and GE reinforcement learners to allow designers to apply the work of this thesis to validating their own systems. In addition, the Toolbox provides templates to simplify the process of wrapping the system and simulator in a format that is conducive to reinforcement learning.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2021; ©2021
Publication date	2021; 2021
Issuance	monographic
Language	English

Creators/Contributors

Author	Koren, Mark
Degree supervisor	Kochenderfer, Mykel J, 1980-
Thesis advisor	Kochenderfer, Mykel J, 1980-
Thesis advisor	Gerdes, J. Christian
Thesis advisor	Lee, Ritchie
Thesis advisor	Sadigh, Dorsa
Degree committee member	Gerdes, J. Christian
Degree committee member	Lee, Ritchie
Degree committee member	Sadigh, Dorsa
Associated with	Stanford University, Department of Aeronautics and Astronautics

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Mark Koren.
Note	Submitted to the Department of Aeronautics and Astronautics.
Thesis	Thesis Ph.D. Stanford University 2021.
Location	https://purl.stanford.edu/pv383pd8838

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...