Uncovering and inducing interpretable causal structure in deep learning models
Abstract/Contents
- Abstract
- A faithful and interpretable explanation of an AI model's behavior and internal structure is a high-level explanation that is human-intelligible but also consistent with the known, but often opaque low-level causal details of the model. We argue that the theory of causal abstraction provides the mathematical foundations for the desired kinds of model explanations. In the analysis mode, we uncover causal structure using interventions on model-internal states to assess whether an interpretable high-level causal model is a faithful description of a deep learning model. In the training mode, we induce interpretable causal structure using interventions during model training to simulate counterfactuals in the deep learning model's activation space. We show how to uncover and induce causal structures in a variety of case studies on deep learning models that reason over language and/or images.
Description
Type of resource | text |
---|---|
Form | electronic resource; remote; computer; online resource |
Extent | 1 online resource. |
Place | California |
Place | [Stanford, California] |
Publisher | [Stanford University] |
Copyright date | 2023; ©2023 |
Publication date | 2023; 2023 |
Issuance | monographic |
Language | English |
Creators/Contributors
Author | Geiger, Atticus |
---|---|
Degree supervisor | Icard, Thomas |
Degree supervisor | Potts, Christopher, 1977- |
Thesis advisor | Icard, Thomas |
Thesis advisor | Potts, Christopher, 1977- |
Thesis advisor | Frank, Michael C, (Professor of human biology) |
Thesis advisor | Goodman, Noah (Noah D.) |
Degree committee member | Frank, Michael C, (Professor of human biology) |
Degree committee member | Goodman, Noah (Noah D.) |
Associated with | Stanford University, School of Humanities and Sciences |
Associated with | Stanford University, Department of Linguistics |
Subjects
Genre | Theses |
---|---|
Genre | Text |
Bibliographic information
Statement of responsibility | Atticus Reed Geiger. |
---|---|
Note | Submitted to the Department of Linguistics. |
Thesis | Thesis Ph.D. Stanford University 2023. |
Location | https://purl.stanford.edu/th321qf7186 |
Access conditions
- Copyright
- © 2023 by Atticus Geiger
- License
- This work is licensed under a Creative Commons Attribution 3.0 Unported license (CC BY).
Also listed in
Loading usage metrics...