Causal aggregation of heterogeneous datasets and stabilization of feature selection procedures

Placeholder Show Content

Abstract/Contents

Abstract
Variable selection is increasingly becoming a key step in any data analysis pipeline. Identifying true relationships between a large number of covariates and a response of interest is a challenging problem that often requires strong modeling assumptions. In this thesis we develop new variable selection methodologies in two different directions. Our contributions build on the recently developed knockoff procedure as well as the causal invariance framework. We show how to aggregate data originating from heterogeneous datasets generated through different experimental settings to estimate causal effects and identify the relevant covariates in a causal sense. Our methodology efficiently uses all available information, and we propose an extension to high dimensions where the number of samples available per environment is small. Finally, we develop a Causal Boosting algorithm that efficiently recovers non-linear causal response functions from multiple datasets where different subsets of covariates are randomized. On a different topic, we propose some improvements on the knockoff procedure by extending the scope of datasets where such methodology can be applied. Going beyond Gaussian distributions, we propose a Bayesian Network knockoff sampling procedure that fits a much larger class of distributions. Also, we identify sources of instability in the procedure and devise an entropy based multi-knockoff procedure to mitigate the variability of the selected set of variables induced by the randomized nature of the procedure.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2022; ©2022
Publication date 2022; 2022
Issuance monographic
Language English

Creators/Contributors

Author Roquero Gimenez, Jaime
Degree supervisor Zou, James
Thesis advisor Zou, James
Thesis advisor Candès, Emmanuel J. (Emmanuel Jean)
Thesis advisor Owen, Art B
Degree committee member Candès, Emmanuel J. (Emmanuel Jean)
Degree committee member Owen, Art B
Associated with Stanford University, Department of Statistics

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Jaime Roquero Gimenez.
Note Submitted to the Department of Statistics.
Thesis Thesis Ph.D. Stanford University 2022.
Location https://purl.stanford.edu/wf262ng5402

Access conditions

Copyright
© 2022 by Jaime Roquero Gimenez
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...