Understanding feature use divergences between human and machine vision

Placeholder Show Content

Abstract/Contents

Abstract
Recent work has highlighted a seemingly sharp divergence between human and machine vision: studies have argued that, whereas people exhibit a shape bias, preferring to classify objects according to their shape (Landau et al., 1988; Geirhos et al., 2018; Kucker et al., 2019), standard ImageNet-trained CNNs privilege texture (Geirhos et al., 2018). How prevalent is this texture bias, and where does it come from? I will present evidence that, while both model architecture and training objective affect a model's level of texture bias, the statistics of the training data are the most important factor, and that naturalistic data augmentation schemes can ameliorate texture bias and improve generalization to out-of-distribution images. On the human side, existing studies have tested people under conditions different than those faced by a feedforward CNN; does the human—machine divergence remain when testing conditions are more fairly aligned? In experiments using brief stimulus presentations, we find that people do still privilege shape over texture. Even so, texture information plays more of a role than previously reported. This work establishes a new benchmark for assessing how "human-like" feedforward vision models are in their shape bias. Shape and texture are two features that are both useful in predicting an object's class. Zooming out, I will study models' treatment of such redundant features in a more general setting. Using synthetic data to explore which input features models learn as a function of their task relevance and difficulty of extraction, we find that CNNs are vulnerable to "feature blindness", privileging a single useful feature even when two features perfectly and redundantly predict image labels. Which of the features is privileged is predictable from the untrained model. Finally, I will discuss challenges and open questions that remain in the quest to build models with human-like visual representations.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2022; ©2022
Publication date 2022; 2022
Issuance monographic
Language English

Creators/Contributors

Author Hermann, Katherine Laura
Degree supervisor McClelland, James L
Thesis advisor McClelland, James L
Thesis advisor Grill-Spector, Kalanit
Thesis advisor Yamins, Daniel
Degree committee member Grill-Spector, Kalanit
Degree committee member Yamins, Daniel
Associated with Stanford University, Department of Psychology

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Katherine Hermann.
Note Submitted to the Department of Psychology.
Thesis Thesis Ph.D. Stanford University 2022.
Location https://purl.stanford.edu/gc637jd7786

Access conditions

Copyright
© 2022 by Katherine Laura Hermann
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...