Visual intelligence through human learning
Abstract/Contents
- Abstract
- At the core of human development is the ability to adapt to new, previously unseen stimuli. We comprehend new situations as a composition of previously seen information and ask one another for clarification when we encounter new concepts. Yet, this ability to go beyond the confounds of their training data remains an open challenge for artificial intelligence. My research designs visual intelligence to reason over new compositions and acquire new concepts by interacting with people. I draw on ideas from human learning---from human cognition and human interaction---to deliver representations, training frameworks, models, evaluation protocols, and interactions for computer vision. My dissertation will explore some the challenges associated with existing vision methods and present the two following lines of work: Drawing on human cognition, I will introduce scene graphs, a cognitively-grounded, compositional visual representation. With the scene graph representation, I will show that it possible for models to learn from a finite set of situations outlined in their training data and allows models to recognize new composition of previous seen concepts. I will build scene graph models that can recognize new visual relationships with as few as $10$ labels per relationship. Finally, I will demonstrate how scene graphs can be used to improve core computer vision tasks such as action recognition, improving existing baselines with as few as $5$ training examples. Since our introduction of scene graphs, the computer vision community has developed hundreds of scene graph models and utilized scene graphs to achieve state-of-the-art results across multiple core tasks, including object localization, captioning, image generation, question answering, 3D understanding, and spatio-temporal action recognition. Drawing on human interaction, I will introduce a framework for socially situated learning. This framework pushes agents beyond traditional active learning paradigms and enables learning from human interactions in social environments. Using this framework, I will design a real-world deployment of a socially situated agent; our agent learns to acquire new concepts by asking people targeted questions on social media about the contents of the photos they upload. By interacting with over $230K$ people over $8$ months, our agent learns to recognize hundreds of new concepts. Finally, To promote pro-social human-computer interactions, I will demonstrate the importance of choosing appropriate metaphors to describe intelligent systems. Together, this dissertation exhibits the benefits of drawing on ideas from human learning to develop better visual intelligence. My research connects ideas from cognitive science and social psychology with advances in computer vision, natural language processing, machine learning, and human-computer interaction.
Description
Type of resource | text |
---|---|
Form | electronic resource; remote; computer; online resource |
Extent | 1 online resource. |
Place | California |
Place | [Stanford, California] |
Publisher | [Stanford University] |
Copyright date | 2021; ©2021 |
Publication date | 2021; 2021 |
Issuance | monographic |
Language | English |
Creators/Contributors
Author | Krishna, Ranjay |
---|---|
Degree supervisor | Bernstein, Michael S, 1984- |
Degree supervisor | Li, Fei Fei, 1976- |
Thesis advisor | Bernstein, Michael S, 1984- |
Thesis advisor | Li, Fei Fei, 1976- |
Thesis advisor | Agrawala, Maneesh |
Thesis advisor | Manning, Christopher D |
Degree committee member | Agrawala, Maneesh |
Degree committee member | Manning, Christopher D |
Associated with | Stanford University, Computer Science Department |
Subjects
Genre | Theses |
---|---|
Genre | Text |
Bibliographic information
Statement of responsibility | Ranjay Krishna. |
---|---|
Note | Submitted to the Computer Science Department. |
Thesis | Thesis Ph.D. Stanford University 2021. |
Location | https://purl.stanford.edu/df658ht9106 |
Access conditions
- Copyright
- © 2021 by Ranjay Krishna
- License
- This work is licensed under a Creative Commons Attribution 3.0 Unported license (CC BY).
Also listed in
Loading usage metrics...