Deep object-centric 3D perception

Placeholder Show Content

Abstract/Contents

Abstract
Teaching machines to perceive visual content in a 3D environment as humans do is a central topic in Artificial Intelligence. The goal is to be able to process different types of 3D sensory inputs and generate symbolic or numerical descriptions about the environment to support decision making. In this thesis, we advocate an object-centric way to generate such descriptions, in which we represent an environment as a collection of 3D objects equipped with various attributes important for specific tasks. To generate such a representation, we focus on deep object-centric 3D perception, a class of approach built upon 3D deep learning techniques. This thesis covers three critical components of deep object-centric 3D perception: constructing large-scale 3D model repository, designing 3D deep learning frameworks to consume various formats of 3D data, applying big data and deep learning techniques to real perception tasks. We start by providing an overview of each component. Following this, we show how we could accelerate the labeling acquisition process to scale-up 3D model repositories so that data-hungry deep learning approaches can be applied. 3D data can usually be represented in different formats. Some of the prevalent geometric formats, such as point cloud and polygon mesh, poses a significant challenge to deep learning framework design since traditional deep nets designed for regular data forms, e.g., images, can not be directly applied. We then investigate how to build deep learning frameworks capable of consuming 3D shape meshes, an irregular graph-structured data format. Next, we provide two real perception applications as case studies, to show how big data and 3D deep learning help the field evolve. In particular, we study instance segmentation in 3D point cloud and develop a novel 3D object proposal network named GSPN as well as a 3D instance segmentation framework named R-PointNet, which boosts the state-of-the-art instance segmentation performance by a large margin on existing benchmarks. In the second application, we go one step further and tackle detailed part-level perception. We study the problem of articulation-based object part segmentation. We show how to modularize deep network design by disentangling complex perception problems into subproblems. We conclude by summarizing our efforts and discuss the challenges and open questions in the field.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2019; ©2019
Publication date 2019; 2019
Issuance monographic
Language English

Creators/Contributors

Author Yi, Li
Degree supervisor Guibas, Leonidas J
Thesis advisor Guibas, Leonidas J
Thesis advisor Girod, Bernd
Thesis advisor Savarese, Silvio
Degree committee member Girod, Bernd
Degree committee member Savarese, Silvio
Associated with Stanford University, Department of Electrical Engineering.

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Li Yi.
Note Submitted to the Department of Electrical Engineering.
Thesis Thesis Ph.D. Stanford University 2019.
Location electronic resource

Access conditions

Copyright
© 2019 by Li Yi
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...