Multi-agent mapping and tracking with novel environment representations

Placeholder Show Content

Abstract/Contents

Abstract
The mission of this thesis is to explore the extremes of sparsity and fine detail in scene representations for robot perception. We introduce and repurpose data structures uncommon to robotics and create distributed mapping and tracking frameworks to take advantage of their unique benefits, from high-level wireframe and topological maps to an object-oriented bundle adjustment method capable of rendering the scene in photorealistic detail. We begin by considering the problem of scouting an area with an arbitrary number of resource-constrained robots to produce a novel type of map called a wireframe. This is an embedded labeled directed graph structure that compactly represents geometry as well as occlusions and frontier information, which can be leveraged to guide the robots into unexplored areas. We develop a multi-robot mapping framework around this minimal map representation so that any number of robots can cooperate in an asynchronous manner to generate this sparse map of the environment. A particle filter framework is tailored to accommodate this map structure, allowing it to be iteratively updated and to maintain multiple hypotheses while robots continue to collect additional data. Simulations demonstrate the improved performance of this map type over standard representations as well as the efficiency gains from employing multiple robots. Next, we present another high-level distributed mapping framework that produces a topological map of continuous areas with multiple disjoint subregions. Example environments that fit this description include a floor in a building divided into rooms, an archipelago divided into islands, or a cave system with individual caverns. In all these scenarios, it is useful to understand the connections between these subregions, such as which rooms have doors between them, or which islands are reachable from one another. No metric information is stored in this model; it is intended not for navigation but instead for higher-level path planning. Like the wireframe mapping algorithm, this method is distributed so any number of robots can collaborate. Each robot uses basic sensing capabilities to move randomly until the perimeter of the region is found. This is followed until new neighbors are met, at which time the robots asynchronously confirm if newly discovered neighbors inhabit the same region. Repeating this process while sharing their latest information with robots in their communication radius is proven to lead to all robots reaching the same correct topological map in polynomial time. This result is demonstrated through simulations. Up until this point, these methods have focused on creating sparse maps of large areas, but now we will shift our focus to a smaller scale. The high-level maps allow for planning, but to avoid collisions the robots need to understand where objects are relative to themselves. To that end, we develop a relative pose estimation framework that is lightweight enough to be run onboard a drone. A key motivation for the structure of this algorithm is the insight that deep learning excels at processing complex input, such as an image, while Bayesian filtering algorithms are well understood and perform well at information synthesis but do not naturally accept images as input. We leverage both methods' strengths by combining a deep learning frontend that efficiently extracts 2D bounding box data from monocular images and passes this to an unscented Kalman filter (UKF) backend that updates the estimated object's pose. We close the loop by using the uncertainty metrics calculated by the backend to improve the frontend performance. This method is shown to be significantly faster and more accurate than the next-best onboard method by running experiments on real hardware. It also performs equivalently to state-of-the-art pose estimation algorithms that rely on depth information in addition to images, as demonstrated on datasets of real images. Lastly, we examine how a new category of models called a Neural Radiance Field (NeRF), a specific example of a neural network-based implicit representation, can be used to jointly estimate the pose of objects and the robot's trajectory. This neural network allows us to generate novel views of an object given its relative pose. Multiple NeRFs are composed into an expected image using their estimated poses, and the difference between this and the measured image produces a photometric loss. Since the representation is made of differentiable neural networks, the loss can be backpropagated through the system to directly optimize the relative pose of the objects and the robot trajectory. Small batches of robot and object poses are optimized together in a sliding window framework. After the small batch is processed, it is combined with the previously optimized keyframes to form the full robot trajectory. This process allows the robot to continue to estimate its motion while continuously refining the object pose estimates. We demonstrate the performance of this method on photorealistic scenes rendered in Blender. We show how the rotation and translation errors decrease as the predicted images, generated from the NeRF representations and their estimated poses, converge to match the measurement images. Together, these perception methods enable robots to understand their environments at multiple levels of abstraction, each of which can be used for different purposes such as high-level path planning or local obstacle detection and avoidance. These algorithms could be combined into a single hierarchical map framework, giving the robot a versatile tool to address a range of challenges. For robots to achieve robust autonomy they will need flexible tools such as this to overcome the multiple layers of tasks that form complex and realistic scenarios, such as search and rescue or autonomous package delivery.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2022; ©2022
Publication date 2022; 2022
Issuance monographic
Language English

Creators/Contributors

Author Caccavale, Adam Wilford
Degree supervisor Follmer, Sean
Degree supervisor Schwager, Mac
Thesis advisor Follmer, Sean
Thesis advisor Schwager, Mac
Thesis advisor Kennedy, Monroe
Degree committee member Kennedy, Monroe
Associated with Stanford University, Department of Mechanical Engineering

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Adam Caccavale.
Note Submitted to the Department of Mechanical Engineering.
Thesis Thesis Ph.D. Stanford University 2022.
Location https://purl.stanford.edu/hz538wr1515

Access conditions

Copyright
© 2022 by Adam Wilford Caccavale
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...