Category-level object perception for physical interaction

Placeholder Show Content

Abstract/Contents

Abstract
Perceiving and performing physical interaction is a crucial ability of humans and therefore an important topic for machine intelligence. Human object interaction and robot object interaction, both of which involve interactions with objects, have been intensively studied for decades, leading to many research problems and applications in both computer vision and robotics community. Therefore, perceiving objects under physical interactions or for the sake of performing interaction becomes a key problem in this field. This thesis covers works on learning interaction-oriented object-centric visual representations for the sake of perceiving and performing physical interactions. The key idea is to perceive objects in a category manner and extract actionable information (e.g. pose, affordance, articulation) that is shared across different object instances from the same object category. The goal of such a vision system is to allow machines to perceive objects from a physical interaction perspective and in a generalizable, interpretable and annotation-efficient way. Here being generalizable means the vision system can well handle many object instances that may have never been seen before; being interpretable means the learned visual representation should be explainable and understandable by humans; and being annotation-efficient means we want to minimize the amount of labels required for learning such visual representations. The thesis starts with three works on estimating and tracking category-level object pose for rigid and articulated objects, which generalize the problem of pose estimation from instance level to category level.Following that, we introduce a novel semi-supervised 3D object detection framework to allow annotation-efficient learning of category-level object information. Lastly, we present a work on multi-step interaction generation, where the built system learns to perceive category-level object states and their changes in human object interaction videos and build a generative model that can be used for generating new interaction sequences as well as robotic motion planning. The thesis concludes by summarizing the projects and discusses potential future directions in the field.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2021; ©2021
Publication date 2021; 2021
Issuance monographic
Language English

Creators/Contributors

Author Wang, He, (Researcher in computer vision)
Degree supervisor Guibas, Leonidas J
Thesis advisor Guibas, Leonidas J
Thesis advisor Finn, Chelsea
Thesis advisor Savarese, Silvio
Degree committee member Finn, Chelsea
Degree committee member Savarese, Silvio
Associated with Stanford University, Department of Electrical Engineering

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility He Wang.
Note Submitted to the Department of Electrical Engineering.
Thesis Thesis Ph.D. Stanford University 2021.
Location https://purl.stanford.edu/ww353xz5308

Access conditions

Copyright
© 2021 by He Wang
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...