Learning compositional and actionable visual representations for 3D shapes
Abstract/Contents
- Abstract
- Humans can easily accomplish a wide variety of daily household tasks by perceiving, understanding, and interacting with diverse 3D objects in unstructured 3D environments. It has been a long desire for vision and robotics researchers to build intelligent agents with human-level visual perception and dexterous manipulation skills. With the emergence of modern machine learning, especially deep learning techniques, the past decade has witnessed substantial progress in the field of 3D vision and robotics. However, today's vision and robotics systems still fall behind with what humans can achieve when generalizing to new tasks and environments. The representation of 3D objects, or 3D shapes, is one of the cornerstones of human-level 3D vision and robotics systems. This is notoriously challenging given that 3D shapes have extraordinarily diverse geometries, rich functional semantics, and complicated part structures. In the meantime, the complex nature of 3D tasks in vision, graphics, and robotics is another challenge we are facing. To this end, this thesis explores to tackle the core problems of building effective visual representations for diverse 3D shapes and designing scalable learning frameworks for various 3D tasks. This thesis focuses on building compositional and actionable visual representations of 3D shapes that can process large-scale 3D shapes and tackle various downstream 3D tasks. It is primarily structured in two parts. The first part introduces a line of compositional representations of 3D shapes for 3D vision and graphics tasks. Here, compared to prior 3D representations, smaller, simpler, and more reusable subcomponents of 3D geometry and functionalities in an object, such as different parts of an object, are discovered and assembled to reduce the complexity of 3D data. The second part further discusses actionable representations of 3D shapes to address the task complexity issue, especially when applied to robot manipulation tasks. Such actionable representations are learned using simulated interactions in a self-supervised manner, benefiting from scalable and inexpensive data generation in simulation. The thesis is then concluded with a discussion on the future work and directions toward a more general large-scale representation learning framework for 3D shapes.
Description
Type of resource | text |
---|---|
Form | electronic resource; remote; computer; online resource |
Extent | 1 online resource. |
Place | California |
Place | [Stanford, California] |
Publisher | [Stanford University] |
Copyright date | 2022; ©2022 |
Publication date | 2022; 2022 |
Issuance | monographic |
Language | English |
Creators/Contributors
Author | Mo, Kaichun | |
---|---|---|
Degree supervisor | Guibas, Leonidas J | |
Thesis advisor | Guibas, Leonidas J | |
Thesis advisor | Bohg, Jeannette, 1981- | |
Thesis advisor | Savarese, Silvio | |
Degree committee member | Bohg, Jeannette, 1981- | |
Degree committee member | Savarese, Silvio | |
Associated with | Stanford University, Computer Science Department |
Subjects
Genre | Theses |
---|---|
Genre | Text |
Bibliographic information
Statement of responsibility | Kaichun Mo. |
---|---|
Note | Submitted to the Computer Science Departmenmt. |
Thesis | Thesis Ph.D. Stanford University 2022. |
Location | https://purl.stanford.edu/xn613mb1512 |
Access conditions
- Copyright
- © 2022 by Kaichun Mo
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...