Uncertainty-aware spatiotemporal perception for autonomous vehicles

Placeholder Show Content

Abstract/Contents

Abstract
Autonomous vehicles are set to revolutionize transportation in terms of safety and efficiency. However, autonomous systems still have challenges operating in complex human environments, such as an autonomous vehicle in a cluttered, dynamic urban setting. A key obstacle to deploying autonomous systems on the road is understanding, anticipating, and making inferences about human behaviors. Autonomous perception builds a general understanding of the environment for a robot. This includes making inferences about human behaviors in both space and time. Humans are difficult to model due to their vastly diverse behaviors and rapidly evolving objectives. Moreover, in cluttered settings, there are computational and visibility limitations. However, humans also possess desirable capabilities, such as their ability to generalize beyond their observed environment. Although learning-based systems have had success in recent years in modeling and imitating human behavior, efficiently capturing the data and model uncertainty for these systems remains an open problem. This thesis proposes algorithmic advances to uncertainty-aware autonomous perception systems in human environments. We make system-level contributions to spatiotemporal robot perception that reasons about human behavior, and foundational advancements in uncertainty-aware machine learning models for trajectory prediction. These contributions enable robotic systems to make uncertainty- and socially-aware spatiotemporal inferences about human behavior. Traditional robot perception is object-centric and modular, consisting of object detection, tracking, and trajectory prediction stages. These systems can fail prior to the prediction stage due to partial occlusions in the environment. We thus propose an alternative end-to-end paradigm for spatiotemporal environment prediction from a map-centric occupancy grid representation. Occupancy grids are robust to partial occlusions, can handle an arbitrary number of human agents in the scene, and do not require a priori information regarding the environment. We investigate the performance of computer vision techniques in this context and develop new mechanisms tailored to the task of spatiotemporal environment prediction. Spatially, robots also need to reason about fully occluded agents in their environment, which may occur due to sensor limitations or other agents on the road obstructing the field of view. Humans excel at extrapolating from their experiences by making inferences from observed social behaviors. We draw inspiration from human intuition to fill in portions of the robot's map that are not observable by traditional sensors. We infer occupancy in these occluded regions by learning a multimodal mapping from observed human driver behaviors to the environment ahead of them, thus treating people as sensors. Our system handles multiple observed agents to maximally inform the occupancy map around the robot. In order to safely integrate human behavior modeling into the robot autonomy stack, the perception system must efficiently account for uncertainty. Human behavior is often modeled using discrete latent spaces in learning-based models to capture the multimodality in the distribution. For example, in a trajectory prediction task, there may be multiple valid future predictions given a past trajectory. To accurately model this latent distribution, the latent space needs to be sufficiently large, leading to tractability concerns for downstream tasks, such as path planning. We address this issue by proposing a sparsification algorithm for discrete latent sample spaces that can be applied post hoc without sacrificing model performance. Our approach successfully balances multimodality and sparsity to achieve efficient data uncertainty estimation. Aside from modeling data uncertainty, learning-based autonomous systems must be aware of their model uncertainty or what they do not know. Flagging out-of-distribution or unknown scenarios encountered in the real world could be helpful to downstream autonomy stack components and to engineers for further system development. Although the machine learning community has been prolific in model uncertainty estimation for small benchmark problems, relatively little work has been done on estimating this uncertainty in complex, learning-based robotic systems. We propose efficiently learning the model uncertainty over an interpretable, low-dimensional latent space in the context of a trajectory prediction task. The algorithms presented in this thesis were validated on real-world autonomous driving data and baselined against state-of-the-art techniques. We show that drawing inspiration from human-level reasoning while modeling the associated uncertainty can inform environment understanding for autonomous perception systems. The contributions made in this thesis are a step towards uncertainty- and socially-aware autonomous systems that can function seamlessly in human environments.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2022; ©2022
Publication date 2022; 2022
Issuance monographic
Language English

Creators/Contributors

Author Itkina, Mikhal
Degree supervisor Kochenderfer, Mykel J, 1980-
Thesis advisor Kochenderfer, Mykel J, 1980-
Thesis advisor Sadigh, Dorsa
Thesis advisor Schwager, Mac
Degree committee member Sadigh, Dorsa
Degree committee member Schwager, Mac
Associated with Stanford University, Department of Aeronautics and Astronautics

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Masha (Mikhal) Itkina.
Note Submitted to the Department of Aeronautics and Astronautics.
Thesis Thesis Ph.D. Stanford University 2022.
Location https://purl.stanford.edu/yk710jh3806

Access conditions

Copyright
© 2022 by Mikhal Itkina
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...