Compositional representation learning : from reasoning to synthesis

Arad, Dor

Compositional representation learning : from reasoning to synthesis

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Ffp269yy9833" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: The world we live in is inherently compositional: Just like a sentence is built upon phrases and words, a visual scene comprises a collection of interacting objects and entities, which in turn are derived from the sum of their parts. This compositionality plays a pivotal role in our ability to understand the world, organize the acquired knowledge through a rich set of concepts, and readily adapt them to novel situations and environments. Essentially, it is considered one of the fundamental building blocks of human intelligence. What does compositionality mean in the context of machine intelligence? How can we encourage neural networks to develop structured understanding of our surroundings? And how can we apply this knowledge to improve in downstream tasks of vision and language? These are the key questions that are explored in the dissertation. We discuss manners and mechanisms to encourage multimodal neural networks to learn compositional scene representations, and in turn operate over them in a compositional manner, which we leverage for two downstream goals: multimodal reasoning, where we introduce models that can draw a sequence of inferences about visual scenes so to answer textual questions about them; and visual synthesis, where a model can inversely generate pictures depicting multi-object scenes from scratch. We fulfill these aims by incorporating a graphical structure into neural networks, which consists of nodes and edges, respectively meant to capture the objects within the scene and the relations among them. We demonstrate how these graph-based structural priors endow neural networks with multiple desirable properties, including: data efficiency, achieved by decomposing a given task into a series of subtasks, each of which could be learned more easily; generalization, where a model can recombine known concepts together in novel ways; controllability, where modifying respective components of the model's latent representation can selectively induce intended modifications within its output; and interpretability of the computational process the model follows, either to create an image or to reason over it. Throughout this work, we study the interplay and analogies between synthesis and reasoning, and show how, with the right inductive biases incorporated, the former capability could foster the latter.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2023; ©2023
Publication date	2023; 2023
Issuance	monographic
Language	English

Creators/Contributors

Author	Arad, Dor
Degree supervisor	Bernstein, Michael S, 1984-
Degree supervisor	McClelland, James L
Thesis advisor	Bernstein, Michael S, 1984-
Thesis advisor	McClelland, James L
Thesis advisor	Guibas, Leonidas J
Thesis advisor	Leskovec, Jurij
Thesis advisor	Liang, Percy
Degree committee member	Guibas, Leonidas J
Degree committee member	Leskovec, Jurij
Degree committee member	Liang, Percy
Associated with	Stanford University, School of Engineering
Associated with	Stanford University, Computer Science Department

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Drew A. Hudson (Dor Arad).
Note	Submitted to the Computer Science Department.
Thesis	Thesis Ph.D. Stanford University 2023.
Location	https://purl.stanford.edu/fp269yy9833

Access conditions

License: This work is licensed under a Creative Commons Attribution 3.0 Unported license (CC BY).

Also listed in

View in SearchWorks

Loading usage metrics...