Multi-task learning for object understanding

Standley, Trevor Scott

Multi-task learning for object understanding

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fdj243qy8372" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Understanding the physical and semantic properties of objects is a major goal of computer vision. These include shape, materials, weight, position, orientation, object class, and affordances. Given the multitude of properties involved, object understanding is an inherently multi-task problem. In this thesis, we present three contributions that advance object understanding by capitalizing on multi-task learning. First, we focus on estimating an object's shape and weight from its image, proposing a model that performs this prediction effectively. We also investigate and characterize human performance on this task through a human study. Next, we explore multi-task learning more directly, addressing object class, position, and orientation. We find that incorporating all tasks within a single multi-task network typically yields poor results. To overcome this limitation, we propose a method that strategically groups tasks together to be learned by multiple smaller models, achieving improved performance using the same compute resources. Finally, we present EMMa, a massive multi-modal object dataset comprising product listing text and images for nearly 3 million objects. EMMa includes labels for materials, price, weight, and category, among other attributes. We apply multi-task learning and expectation-maximization to fill in missing data, ensuring that every object possesses a label for each attribute. Moreover, we introduce smart-labeling, an innovative technique for rapidly adding custom properties to every object in EMMa with minimal human labeling effort. Taken together, these contributions effectively utilize multi-task learning to address a significant portion of the object understanding landscape, paving the way for more efficient and accurate computer vision models.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2023; ©2023
Publication date	2023; 2023
Issuance	monographic
Language	English

Creators/Contributors

Author	Standley, Trevor Scott
Degree supervisor	Savarese, Silvio
Degree supervisor	Wu, Jiajun, (Computer scientist)
Thesis advisor	Savarese, Silvio
Thesis advisor	Wu, Jiajun, (Computer scientist)
Thesis advisor	Koyejo, Sanmi
Degree committee member	Koyejo, Sanmi
Associated with	Stanford University, School of Engineering
Associated with	Stanford University, Computer Science Department

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Trevor Standley.
Note	Submitted to the Computer Science Department.
Thesis	Thesis Ph.D. Stanford University 2023.
Location	https://purl.stanford.edu/dj243qy8372

Access conditions

License: This work is licensed under a Creative Commons Attribution 3.0 Unported license (CC BY).

Also listed in

View in SearchWorks

Loading usage metrics...