Understanding feature use divergences between human and machine vision

Hermann, Katherine Laura

Understanding feature use divergences between human and machine vision

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fgc637jd7786" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Recent work has highlighted a seemingly sharp divergence between human and machine vision: studies have argued that, whereas people exhibit a shape bias, preferring to classify objects according to their shape (Landau et al., 1988; Geirhos et al., 2018; Kucker et al., 2019), standard ImageNet-trained CNNs privilege texture (Geirhos et al., 2018). How prevalent is this texture bias, and where does it come from? I will present evidence that, while both model architecture and training objective affect a model's level of texture bias, the statistics of the training data are the most important factor, and that naturalistic data augmentation schemes can ameliorate texture bias and improve generalization to out-of-distribution images. On the human side, existing studies have tested people under conditions different than those faced by a feedforward CNN; does the human—machine divergence remain when testing conditions are more fairly aligned? In experiments using brief stimulus presentations, we find that people do still privilege shape over texture. Even so, texture information plays more of a role than previously reported. This work establishes a new benchmark for assessing how "human-like" feedforward vision models are in their shape bias. Shape and texture are two features that are both useful in predicting an object's class. Zooming out, I will study models' treatment of such redundant features in a more general setting. Using synthetic data to explore which input features models learn as a function of their task relevance and difficulty of extraction, we find that CNNs are vulnerable to "feature blindness", privileging a single useful feature even when two features perfectly and redundantly predict image labels. Which of the features is privileged is predictable from the untrained model. Finally, I will discuss challenges and open questions that remain in the quest to build models with human-like visual representations.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2022; ©2022
Publication date	2022; 2022
Issuance	monographic
Language	English

Creators/Contributors

Author	Hermann, Katherine Laura
Degree supervisor	McClelland, James L
Thesis advisor	McClelland, James L
Thesis advisor	Grill-Spector, Kalanit
Thesis advisor	Yamins, Daniel
Degree committee member	Grill-Spector, Kalanit
Degree committee member	Yamins, Daniel
Associated with	Stanford University, Department of Psychology

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Katherine Hermann.
Note	Submitted to the Department of Psychology.
Thesis	Thesis Ph.D. Stanford University 2022.
Location	https://purl.stanford.edu/gc637jd7786

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...