Learning compositional and actionable visual representations for 3D shapes

Mo, Kaichun

Learning compositional and actionable visual representations for 3D shapes

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fxn613mb1512" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Humans can easily accomplish a wide variety of daily household tasks by perceiving, understanding, and interacting with diverse 3D objects in unstructured 3D environments. It has been a long desire for vision and robotics researchers to build intelligent agents with human-level visual perception and dexterous manipulation skills. With the emergence of modern machine learning, especially deep learning techniques, the past decade has witnessed substantial progress in the field of 3D vision and robotics. However, today's vision and robotics systems still fall behind with what humans can achieve when generalizing to new tasks and environments. The representation of 3D objects, or 3D shapes, is one of the cornerstones of human-level 3D vision and robotics systems. This is notoriously challenging given that 3D shapes have extraordinarily diverse geometries, rich functional semantics, and complicated part structures. In the meantime, the complex nature of 3D tasks in vision, graphics, and robotics is another challenge we are facing. To this end, this thesis explores to tackle the core problems of building effective visual representations for diverse 3D shapes and designing scalable learning frameworks for various 3D tasks. This thesis focuses on building compositional and actionable visual representations of 3D shapes that can process large-scale 3D shapes and tackle various downstream 3D tasks. It is primarily structured in two parts. The first part introduces a line of compositional representations of 3D shapes for 3D vision and graphics tasks. Here, compared to prior 3D representations, smaller, simpler, and more reusable subcomponents of 3D geometry and functionalities in an object, such as different parts of an object, are discovered and assembled to reduce the complexity of 3D data. The second part further discusses actionable representations of 3D shapes to address the task complexity issue, especially when applied to robot manipulation tasks. Such actionable representations are learned using simulated interactions in a self-supervised manner, benefiting from scalable and inexpensive data generation in simulation. The thesis is then concluded with a discussion on the future work and directions toward a more general large-scale representation learning framework for 3D shapes.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2022; ©2022
Publication date	2022; 2022
Issuance	monographic
Language	English

Creators/Contributors

Author	Mo, Kaichun
Degree supervisor	Guibas, Leonidas J
Thesis advisor	Guibas, Leonidas J
Thesis advisor	Bohg, Jeannette, 1981-
Thesis advisor	Savarese, Silvio
Degree committee member	Bohg, Jeannette, 1981-
Degree committee member	Savarese, Silvio
Associated with	Stanford University, Computer Science Department

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Kaichun Mo.
Note	Submitted to the Computer Science Departmenmt.
Thesis	Thesis Ph.D. Stanford University 2022.
Location	https://purl.stanford.edu/xn613mb1512

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...