Visual intelligence through human learning

Krishna, Ranjay

Visual intelligence through human learning

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fdf658ht9106" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: At the core of human development is the ability to adapt to new, previously unseen stimuli. We comprehend new situations as a composition of previously seen information and ask one another for clarification when we encounter new concepts. Yet, this ability to go beyond the confounds of their training data remains an open challenge for artificial intelligence. My research designs visual intelligence to reason over new compositions and acquire new concepts by interacting with people. I draw on ideas from human learning---from human cognition and human interaction---to deliver representations, training frameworks, models, evaluation protocols, and interactions for computer vision. My dissertation will explore some the challenges associated with existing vision methods and present the two following lines of work: Drawing on human cognition, I will introduce scene graphs, a cognitively-grounded, compositional visual representation. With the scene graph representation, I will show that it possible for models to learn from a finite set of situations outlined in their training data and allows models to recognize new composition of previous seen concepts. I will build scene graph models that can recognize new visual relationships with as few as $10$ labels per relationship. Finally, I will demonstrate how scene graphs can be used to improve core computer vision tasks such as action recognition, improving existing baselines with as few as $5$ training examples. Since our introduction of scene graphs, the computer vision community has developed hundreds of scene graph models and utilized scene graphs to achieve state-of-the-art results across multiple core tasks, including object localization, captioning, image generation, question answering, 3D understanding, and spatio-temporal action recognition. Drawing on human interaction, I will introduce a framework for socially situated learning. This framework pushes agents beyond traditional active learning paradigms and enables learning from human interactions in social environments. Using this framework, I will design a real-world deployment of a socially situated agent; our agent learns to acquire new concepts by asking people targeted questions on social media about the contents of the photos they upload. By interacting with over $230K$ people over $8$ months, our agent learns to recognize hundreds of new concepts. Finally, To promote pro-social human-computer interactions, I will demonstrate the importance of choosing appropriate metaphors to describe intelligent systems. Together, this dissertation exhibits the benefits of drawing on ideas from human learning to develop better visual intelligence. My research connects ideas from cognitive science and social psychology with advances in computer vision, natural language processing, machine learning, and human-computer interaction.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2021; ©2021
Publication date	2021; 2021
Issuance	monographic
Language	English

Creators/Contributors

Author	Krishna, Ranjay
Degree supervisor	Bernstein, Michael S, 1984-
Degree supervisor	Li, Fei Fei, 1976-
Thesis advisor	Bernstein, Michael S, 1984-
Thesis advisor	Li, Fei Fei, 1976-
Thesis advisor	Agrawala, Maneesh
Thesis advisor	Manning, Christopher D
Degree committee member	Agrawala, Maneesh
Degree committee member	Manning, Christopher D
Associated with	Stanford University, Computer Science Department

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Ranjay Krishna.
Note	Submitted to the Computer Science Department.
Thesis	Thesis Ph.D. Stanford University 2021.
Location	https://purl.stanford.edu/df658ht9106

Access conditions

License: This work is licensed under a Creative Commons Attribution 3.0 Unported license (CC BY).

Also listed in

View in SearchWorks

Loading usage metrics...