Causal aggregation of heterogeneous datasets and stabilization of feature selection procedures
Abstract/Contents
- Abstract
- Variable selection is increasingly becoming a key step in any data analysis pipeline. Identifying true relationships between a large number of covariates and a response of interest is a challenging problem that often requires strong modeling assumptions. In this thesis we develop new variable selection methodologies in two different directions. Our contributions build on the recently developed knockoff procedure as well as the causal invariance framework. We show how to aggregate data originating from heterogeneous datasets generated through different experimental settings to estimate causal effects and identify the relevant covariates in a causal sense. Our methodology efficiently uses all available information, and we propose an extension to high dimensions where the number of samples available per environment is small. Finally, we develop a Causal Boosting algorithm that efficiently recovers non-linear causal response functions from multiple datasets where different subsets of covariates are randomized. On a different topic, we propose some improvements on the knockoff procedure by extending the scope of datasets where such methodology can be applied. Going beyond Gaussian distributions, we propose a Bayesian Network knockoff sampling procedure that fits a much larger class of distributions. Also, we identify sources of instability in the procedure and devise an entropy based multi-knockoff procedure to mitigate the variability of the selected set of variables induced by the randomized nature of the procedure.
Description
Type of resource | text |
---|---|
Form | electronic resource; remote; computer; online resource |
Extent | 1 online resource. |
Place | California |
Place | [Stanford, California] |
Publisher | [Stanford University] |
Copyright date | 2022; ©2022 |
Publication date | 2022; 2022 |
Issuance | monographic |
Language | English |
Creators/Contributors
Author | Roquero Gimenez, Jaime |
---|---|
Degree supervisor | Zou, James |
Thesis advisor | Zou, James |
Thesis advisor | Candès, Emmanuel J. (Emmanuel Jean) |
Thesis advisor | Owen, Art B |
Degree committee member | Candès, Emmanuel J. (Emmanuel Jean) |
Degree committee member | Owen, Art B |
Associated with | Stanford University, Department of Statistics |
Subjects
Genre | Theses |
---|---|
Genre | Text |
Bibliographic information
Statement of responsibility | Jaime Roquero Gimenez. |
---|---|
Note | Submitted to the Department of Statistics. |
Thesis | Thesis Ph.D. Stanford University 2022. |
Location | https://purl.stanford.edu/wf262ng5402 |
Access conditions
- Copyright
- © 2022 by Jaime Roquero Gimenez
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...