Causal inference with random forests
- Random forests, introduced by Breiman , have become one of the most popular machine learning algorithms among practitioners, and reliably achieve good predictive performance across several application areas. This has led to considerable interest in using random forests for doing science, or drawing statistical inferences in problems that do not reduce immediately to prediction. As a step in this direction, this thesis studies how random forests can be used for understanding treatment effect heterogeneity as it may arise in, e.g., personalized medicine. Our main contributions are as follows: - We develop a causal forest algorithm for heterogeneous treatment effect estimation, and find our method to be substantially more powerful at identifying treatment heterogeneity than traditional methods based on nearest-neighbor matching, especially when the number of considered covariates is large. - We provide an asymptotic statistical analysis of causal forests, and prove a Gaussian limit result. We then propose a practical method for estimating the noise scale of causal forests, thus allowing for valid statistical inference with causal forests. - In a high-dimensional regime where the problem complexity and the number of observations jointly approach infinity, we identify the signal strength at which tree-based methods become able to accurately detect treatment heterogeneity. Perhaps strikingly, we find that the required signal strength only scales logarithmically in the dimension of the problem. Taken together, these results show that random forests -- despite often being understood as a mere black box predictive algorithm -- provide a powerful toolbox for heterogeneous treatment effect estimation in modern large-scale problems.
|Type of resource
|electronic; electronic resource; remote
|1 online resource.
|Stanford University, Department of Statistics.
|Statement of responsibility
|Submitted to the Department of Statistics.
|Thesis (Ph.D.)--Stanford University, 2016.
- © 2016 by Stefan De Treville Wager
Also listed in
Loading usage metrics...