Uncovering and inducing interpretable causal structure in deep learning models

Placeholder Show Content

Abstract/Contents

Abstract
A faithful and interpretable explanation of an AI model's behavior and internal structure is a high-level explanation that is human-intelligible but also consistent with the known, but often opaque low-level causal details of the model. We argue that the theory of causal abstraction provides the mathematical foundations for the desired kinds of model explanations. In the analysis mode, we uncover causal structure using interventions on model-internal states to assess whether an interpretable high-level causal model is a faithful description of a deep learning model. In the training mode, we induce interpretable causal structure using interventions during model training to simulate counterfactuals in the deep learning model's activation space. We show how to uncover and induce causal structures in a variety of case studies on deep learning models that reason over language and/or images.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2023; ©2023
Publication date 2023; 2023
Issuance monographic
Language English

Creators/Contributors

Author Geiger, Atticus
Degree supervisor Icard, Thomas
Degree supervisor Potts, Christopher, 1977-
Thesis advisor Icard, Thomas
Thesis advisor Potts, Christopher, 1977-
Thesis advisor Frank, Michael C, (Professor of human biology)
Thesis advisor Goodman, Noah (Noah D.)
Degree committee member Frank, Michael C, (Professor of human biology)
Degree committee member Goodman, Noah (Noah D.)
Associated with Stanford University, School of Humanities and Sciences
Associated with Stanford University, Department of Linguistics

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Atticus Reed Geiger.
Note Submitted to the Department of Linguistics.
Thesis Thesis Ph.D. Stanford University 2023.
Location https://purl.stanford.edu/th321qf7186

Access conditions

Copyright
© 2023 by Atticus Geiger
License
This work is licensed under a Creative Commons Attribution 3.0 Unported license (CC BY).

Also listed in

Loading usage metrics...