Learning Object-Centric Visual Representations for Common Sense Reasoning

Placeholder Show Content


People understand the world as a sum of its parts. Our effortless mental ability to simulate and imagine what will happen crucially depends on a scene representation that is compositional with respect to objects and the interactions between them. Similarly, learning representations with an emphasis on the perception and understanding of objects plays a key role in building human-like AI capable of supporting higher-level cognitive abilities such as common sense reasoning, causal reasoning, and goal-oriented planning. Many current methods in machine learning focus on learning structured representations in which objects are only implicitly represented, which poses a threat to model interpretability. Motivated by these observations, this work aims to develop algorithms for learning object-centric representations in which objects are explicitly represented. Given the importance of objects in human cognition, we draw inspiration from cognitive science to motivate and provide theoretical underpinnings. Specifically, we take a rationalist approach in augmenting deep neural network architectures to exhibit object-centric representations, and demonstrate empirically how these representations can be efficieintly leveraged for downstream tasks such as controllable image synthesis.


Type of resource text
Date created June 4, 2021
Date modified December 5, 2022
Publication date September 8, 2021


Author Tan, Kevin


Subject Machine learning
Subject computer vision
Subject cognitive science
Genre Text
Genre Thesis

Bibliographic information

Access conditions

Use and reproduction
User agrees that, where applicable, content will not be used to identify or to otherwise infringe the privacy or confidentiality rights of individuals. Content distributed via the Stanford Digital Repository may be subject to additional license and use restrictions applied by the depositor.
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).


Master's Theses, Symbolic Systems Program, Stanford University

View other items in this collection in SearchWorks

Contact information

Also listed in

Loading usage metrics...