Deep learning methods for modeling interrelations of dynamic and static objects

Placeholder Show Content

Abstract/Contents

Abstract
We are witnessing an increasing growth of autonomous platforms such as self-driving cars and social robots around us. These smart machines will be navigating and interacting with us in various scenarios. However, understanding the behavior and relations between the humans and objects in different scenes is very important for these smart agents navigating in this environments. Modeling the interrelations among objects and humans enables the computer vision systems to better understand complex scenes and it comes with many challenges: 1) How to model the complex scene context? 2) How to model the interrelations among objects and humans? 3) How to capture and model the unwritten rules of human social behavior? The goal of this thesis is to address some of these challenges. Modeling the inter-relations of dynamic and static objects or agents within an environment applies to a wide range of domains from autonomous driving vehicles and social robot navigation, to abnormal behavior detection in surveillance. For example, for an autonomous vehicle to navigate in a space, it has to understand the surrounding environment and model the inter-relations of dynamic and static objects, i.e. the relationship between the agent and the static objects, the moving objects and their joint interactions. To this end, we need to track all moving objects (known as multi-object tracking problem), predict their future positions and interactions with the environment (known as path prediction problem), and find the traversability of a path given the state of static and moving objects (knows as traversability estimation). To tackle these problems we proposed several new deep learning methods as follows: 1- In the first chapter of this thesis, we tackle the multi-object tracking problem, we proposed an online method that encodes long-term temporal dependencies across multiple cues using deep learning. Our framework uses a structure of Recurrent Neural Networks (RNN) that jointly reasons on multiple cues over a temporal window. 2- The second part of this thesis (Chapters 2 and 3) tackles the problem of path prediction, we proposed two interpretable frameworks CAR-Net and Sophie based on Attentive Generative Adversarial Network (GAN), that can leverage two sources of information: the past trajectory of all the agents in the scene and a wide top-view image of the navigation scene. Our predicted trajectories are obeying physical constraints of the environment while anticipating the movements and social behavior of other people. 3- Finally, in the last part of the this thesis, chapters 3 and 4 tackle the traversability estimation problem, we proposed GONet and GONet++ which uses several deep learning networks trained on a large amount of data from a single RGB camera to identify the traversable and non-traversable spaces in an environment around an agent. All proposed approaches were evaluated on relevant benchmarks and outperform all previous state-of-the-art methods.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2018; ©2018
Publication date 2018; 2018
Issuance monographic
Language English

Creators/Contributors

Author Sadeghian, Amir Abbas
Degree supervisor Savarese, Silvio
Thesis advisor Savarese, Silvio
Thesis advisor Sadigh, Dorsa
Thesis advisor Wetzstein, Gordon
Degree committee member Sadigh, Dorsa
Degree committee member Wetzstein, Gordon
Associated with Stanford University, Department of Electrical Engineering.

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Amir Sadeghian.
Note Submitted to the Department of Electrical Engineering.
Thesis Thesis Ph.D. Stanford University 2018.
Location electronic resource

Access conditions

Copyright
© 2018 by Amir Abbas Sadeghian
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...