Uncovering and inducing interpretable causal structure in deep learning models

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fth321qf7186" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: A faithful and interpretable explanation of an AI model's behavior and internal structure is a high-level explanation that is human-intelligible but also consistent with the known, but often opaque low-level causal details of the model. We argue that the theory of causal abstraction provides the mathematical foundations for the desired kinds of model explanations. In the analysis mode, we uncover causal structure using interventions on model-internal states to assess whether an interpretable high-level causal model is a faithful description of a deep learning model. In the training mode, we induce interpretable causal structure using interventions during model training to simulate counterfactuals in the deep learning model's activation space. We show how to uncover and induce causal structures in a variety of case studies on deep learning models that reason over language and/or images.

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2023; ©2023
Publication date	2023; 2023
Issuance	monographic
Language	English

Author	Geiger, Atticus
Degree supervisor	Icard, Thomas
Degree supervisor	Potts, Christopher, 1977-
Thesis advisor	Icard, Thomas
Thesis advisor	Potts, Christopher, 1977-
Thesis advisor	Frank, Michael C, (Professor of human biology)
Thesis advisor	Goodman, Noah (Noah D.)
Degree committee member	Frank, Michael C, (Professor of human biology)
Degree committee member	Goodman, Noah (Noah D.)
Associated with	Stanford University, School of Humanities and Sciences
Associated with	Stanford University, Department of Linguistics

Genre	Theses
Genre	Text

Statement of responsibility	Atticus Reed Geiger.
Note	Submitted to the Department of Linguistics.
Thesis	Thesis Ph.D. Stanford University 2023.
Location	https://purl.stanford.edu/th321qf7186

License: This work is licensed under a Creative Commons Attribution 3.0 Unported license (CC BY).

View in SearchWorks

Loading usage metrics...