Multi-task learning for object understanding
Abstract/Contents
- Abstract
- Understanding the physical and semantic properties of objects is a major goal of computer vision. These include shape, materials, weight, position, orientation, object class, and affordances. Given the multitude of properties involved, object understanding is an inherently multi-task problem. In this thesis, we present three contributions that advance object understanding by capitalizing on multi-task learning. First, we focus on estimating an object's shape and weight from its image, proposing a model that performs this prediction effectively. We also investigate and characterize human performance on this task through a human study. Next, we explore multi-task learning more directly, addressing object class, position, and orientation. We find that incorporating all tasks within a single multi-task network typically yields poor results. To overcome this limitation, we propose a method that strategically groups tasks together to be learned by multiple smaller models, achieving improved performance using the same compute resources. Finally, we present EMMa, a massive multi-modal object dataset comprising product listing text and images for nearly 3 million objects. EMMa includes labels for materials, price, weight, and category, among other attributes. We apply multi-task learning and expectation-maximization to fill in missing data, ensuring that every object possesses a label for each attribute. Moreover, we introduce smart-labeling, an innovative technique for rapidly adding custom properties to every object in EMMa with minimal human labeling effort. Taken together, these contributions effectively utilize multi-task learning to address a significant portion of the object understanding landscape, paving the way for more efficient and accurate computer vision models.
Description
Type of resource | text |
---|---|
Form | electronic resource; remote; computer; online resource |
Extent | 1 online resource. |
Place | California |
Place | [Stanford, California] |
Publisher | [Stanford University] |
Copyright date | 2023; ©2023 |
Publication date | 2023; 2023 |
Issuance | monographic |
Language | English |
Creators/Contributors
Author | Standley, Trevor Scott |
---|---|
Degree supervisor | Savarese, Silvio |
Degree supervisor | Wu, Jiajun, (Computer scientist) |
Thesis advisor | Savarese, Silvio |
Thesis advisor | Wu, Jiajun, (Computer scientist) |
Thesis advisor | Koyejo, Sanmi |
Degree committee member | Koyejo, Sanmi |
Associated with | Stanford University, School of Engineering |
Associated with | Stanford University, Computer Science Department |
Subjects
Genre | Theses |
---|---|
Genre | Text |
Bibliographic information
Statement of responsibility | Trevor Standley. |
---|---|
Note | Submitted to the Computer Science Department. |
Thesis | Thesis Ph.D. Stanford University 2023. |
Location | https://purl.stanford.edu/dj243qy8372 |
Access conditions
- Copyright
- © 2023 by Trevor Scott Standley
- License
- This work is licensed under a Creative Commons Attribution 3.0 Unported license (CC BY).
Also listed in
Loading usage metrics...