Deep object-centric 3D perception

Yi, Li

Deep object-centric 3D perception

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fdf485sz4187" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Teaching machines to perceive visual content in a 3D environment as humans do is a central topic in Artificial Intelligence. The goal is to be able to process different types of 3D sensory inputs and generate symbolic or numerical descriptions about the environment to support decision making. In this thesis, we advocate an object-centric way to generate such descriptions, in which we represent an environment as a collection of 3D objects equipped with various attributes important for specific tasks. To generate such a representation, we focus on deep object-centric 3D perception, a class of approach built upon 3D deep learning techniques. This thesis covers three critical components of deep object-centric 3D perception: constructing large-scale 3D model repository, designing 3D deep learning frameworks to consume various formats of 3D data, applying big data and deep learning techniques to real perception tasks. We start by providing an overview of each component. Following this, we show how we could accelerate the labeling acquisition process to scale-up 3D model repositories so that data-hungry deep learning approaches can be applied. 3D data can usually be represented in different formats. Some of the prevalent geometric formats, such as point cloud and polygon mesh, poses a significant challenge to deep learning framework design since traditional deep nets designed for regular data forms, e.g., images, can not be directly applied. We then investigate how to build deep learning frameworks capable of consuming 3D shape meshes, an irregular graph-structured data format. Next, we provide two real perception applications as case studies, to show how big data and 3D deep learning help the field evolve. In particular, we study instance segmentation in 3D point cloud and develop a novel 3D object proposal network named GSPN as well as a 3D instance segmentation framework named R-PointNet, which boosts the state-of-the-art instance segmentation performance by a large margin on existing benchmarks. In the second application, we go one step further and tackle detailed part-level perception. We study the problem of articulation-based object part segmentation. We show how to modularize deep network design by disentangling complex perception problems into subproblems. We conclude by summarizing our efforts and discuss the challenges and open questions in the field.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2019; ©2019
Publication date	2019; 2019
Issuance	monographic
Language	English

Creators/Contributors

Author	Yi, Li
Degree supervisor	Guibas, Leonidas J
Thesis advisor	Guibas, Leonidas J
Thesis advisor	Girod, Bernd
Thesis advisor	Savarese, Silvio
Degree committee member	Girod, Bernd
Degree committee member	Savarese, Silvio
Associated with	Stanford University, Department of Electrical Engineering.

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Li Yi.
Note	Submitted to the Department of Electrical Engineering.
Thesis	Thesis Ph.D. Stanford University 2019.
Location	electronic resource

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...