Multi-task learning for object understanding

Placeholder Show Content

Abstract/Contents

Abstract
Understanding the physical and semantic properties of objects is a major goal of computer vision. These include shape, materials, weight, position, orientation, object class, and affordances. Given the multitude of properties involved, object understanding is an inherently multi-task problem. In this thesis, we present three contributions that advance object understanding by capitalizing on multi-task learning. First, we focus on estimating an object's shape and weight from its image, proposing a model that performs this prediction effectively. We also investigate and characterize human performance on this task through a human study. Next, we explore multi-task learning more directly, addressing object class, position, and orientation. We find that incorporating all tasks within a single multi-task network typically yields poor results. To overcome this limitation, we propose a method that strategically groups tasks together to be learned by multiple smaller models, achieving improved performance using the same compute resources. Finally, we present EMMa, a massive multi-modal object dataset comprising product listing text and images for nearly 3 million objects. EMMa includes labels for materials, price, weight, and category, among other attributes. We apply multi-task learning and expectation-maximization to fill in missing data, ensuring that every object possesses a label for each attribute. Moreover, we introduce smart-labeling, an innovative technique for rapidly adding custom properties to every object in EMMa with minimal human labeling effort. Taken together, these contributions effectively utilize multi-task learning to address a significant portion of the object understanding landscape, paving the way for more efficient and accurate computer vision models.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2023; ©2023
Publication date 2023; 2023
Issuance monographic
Language English

Creators/Contributors

Author Standley, Trevor Scott
Degree supervisor Savarese, Silvio
Degree supervisor Wu, Jiajun, (Computer scientist)
Thesis advisor Savarese, Silvio
Thesis advisor Wu, Jiajun, (Computer scientist)
Thesis advisor Koyejo, Sanmi
Degree committee member Koyejo, Sanmi
Associated with Stanford University, School of Engineering
Associated with Stanford University, Computer Science Department

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Trevor Standley.
Note Submitted to the Computer Science Department.
Thesis Thesis Ph.D. Stanford University 2023.
Location https://purl.stanford.edu/dj243qy8372

Access conditions

Copyright
© 2023 by Trevor Scott Standley
License
This work is licensed under a Creative Commons Attribution 3.0 Unported license (CC BY).

Also listed in

Loading usage metrics...