Some methods to compare the real-world performance of causal estimators
- This dissertation will detail the work I have done to create methods that gauge the finite-sample performance of some causal estimators. The claim of the dissertation is that given a dataset and some causal estimators of the average or individual treatment effect, we can typically predict the finite-sample performance of each estimator using nothing but the observed data. This veracity of this claim is demonstrated using simulations in which the true risk of each estimator can be assessed. The methods I propose are attuned to the needs of researchers working with non-parametric models of high-dimensional observational data, as is common in clinical informatics applications. The first chapter will introduce the overall problem of causal inference, again with an eye towards general non-parametric inference. This will include an introduction of key concepts such as confounding, statistical risk, and the potential outcomes framework. I will review the panoply of methods available for estimating average and individual effects, summarizing key assumptions, theoretical results, and prior findings. One key takeaway of this chapter will be that the fundamental problem of causal inference is a barrier to validation and model selection as well as to estimation: it is not clear how to assess the quality of our final estimate. This problem is especially relevant in the non-parametric high-dimensional setting where estimators may be sensitive to a number of user-specified hyperparameters. The second chapter will describe my first foray into the maze of available methods for causal inference. The purpose of this chapter is thus to give a clear motivating example of the validation and model selection problem in causal inference and how it particularly affects high-dimensional inference where there are many choices to be made. I will discuss two studies I performed in which I observed that covariate balance after matching was highly dependent on the hyperparameter settings of the propensity score model and optimal covariate balance was not achieved by minimizing cross-validation error. Further investigation revealed that over-regularization (relative to the regularization at optimal balance) lead to under-correction of imbalances, while under-regularization led to over-correction. I discovered that cross-estimation of the propensity score lead to good covariate balance. However, cross-estimation did not significantly impact the final treatment effect estimates in either study, leaving open the question of how best to evaluate the impact of using different estimators or hyperparameter choices. The difficulty in assessing the impact of such strategies led me to develop a simulation framework called synth-validation that will be the focus of my third chapter. Synth-validation is a tool to predict the performance of average treatment effect estimators given a particular dataset. In synth-validation, the dataset of interest is modeled under user-imposed treatment effects to create a variety of simulations that are optimized to produce data that resemble the dataset of interest. However, since these data are simulated, the true potential outcomes are known and the true performance of all estimators can be assessed. Since the simulated data are constructed to be similar to the real data, the relative performance of estimators should be similar in both settings. I will describe synth-validation in some detail and present empirical results that demonstrate benefit in model selection over the consistent use of any single estimator. Although synth-validation proved effective, it is computationally intensive and somewhat unorthodox. Furthermore, it is not obvious how to use synth-validation when the estimand of interest is the individual treatment effect and not an average. In my fourth chapter, I will show how several metrics that have been used for fitting individual treatment effects models or learning optimal policies can be adapted for model selection and validation. I will present results that demonstrate that all of these metrics improve over random model selection and that one in particular (the R-objective) most consistently predicts the true performance of each estimator. My fifth chapter will conclude with a brief summary of my findings, all of which indicate that the finite-sample performance of causal estimators can typically be empirically assessed. I will also identify remaining barriers and avenues for future investigation.
|Type of resource
|electronic resource; remote; computer; online resource
|1 online resource.
|Degree committee member
|Degree committee member
|Degree committee member
|Stanford University, Program in Biomedical Informatics.
|Statement of responsibility
|Submitted to the Program in Biomedical Informatics.
|Thesis Ph.D. Stanford University 2019.
- © 2018 by Alejandro Schuler
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...