Large scale causal inference with machine learning

Placeholder Show Content

Abstract/Contents

Abstract
This thesis focuses on large scale causal inference using machine learning techniques. It comprises of three chapters. The first one is ''Average Treatment Effect Estimation in High Dimensional Observational Data: Practical Recommendations, '' which is co-written by the author and Guido Imbens. We are motivated by the fact that there is a great interest in estimating average treatment effects (ATEs) in observational data of high dimensions. Along this trend, many researchers have proposed different ATE estimators in this setting. They then show the superior power of their own methods through self-designed simulations. However, we do not know which are the best estimators to use on a new dataset. This is especially important because in many cases, estimates returned by different estimators tend to be largely different. In this chapter, we first review a rich list of ATE estimators in high dimensional non-experimental data. We then systematically simulate data to compare those methods, treating these simulated datasets as observational. Last, we provide a procedure using diagnostic tests which will help applied researchers decide on which estimators to use and which estimates are the most credible. We test the proposed procedure on the simulated data as well as real data, and observe that it works well on real data and most of simulated data. A future work would be to adjust the procedure to fit more datasets. Also we would like to adjust the procedure to fit really big data in which computational time is a concern. The second chapter in this thesis is ''A Deep Causal Inference Approach to Measuring the Effects of Forming Group Loans in Online Non-profit Microfinance Platform, '' which is co-written by the author and Yuanyuan Shen. In this chapter we investigate Kiva, an online non-profit crowdfunding microfinance platform that raises funds for the poor. The borrowers on Kiva are individuals in urgent need of money. To raise funds as fast as possible, they have the option to form groups to request loans. While it is generally believed that group loans pose less risk for investors than individual loans do, we study whether this is the case in a philanthropic online marketplace. In particular, we measure the effect of group loans on funding time while controlling for the loan sizes and other factors. Because loan descriptions (in the form of texts) play an important role in lenders' decision process on Kiva, we make use of this information through deep learning in natural language processing. We find that on average, forming group loans speeds up the funding time by at least two days in general. Beside baseline models, we use three advanced estimators combined with deep learning techniques to estimate the effect of interest. In the future work, we would like to use more estimators and the diagnostic tests from the first chapter to choose the right estimators to use. Since the data is very big, we may randomly choose a small subset of its to apply the diagnostic tests on. Also, we would like to estimate the treatment effect on subgroups of the data, such as those in agriculture sector. The last chapter in this thesis is ''The Supervised Learning Approach To Estimating Heterogeneous Causal Regime Effects.'' In this chapter, we develop a nonparametric framework using supervised learning to estimate heterogeneous treatment regime effects from observational or experimental data. A treatment regime is a set of sequential treatments. The main idea is to transform the unobserved variable measuring a treatment regime effect into a new observed entity through an estimable weight from the data. With the new ``transformed'' entity, we can estimate causal regime effects with high accuracy using techniques from the machine learning literature. Our proposed method performs well with sequential multi-valued treatments. It also works for both observational and experimental data; this is useful as it is generally hard to set up sequential randomized experiments in reality. The method's advantage comes from the accurate estimation power of the machine learning approach. Unlike most traditional approaches, our method does not rely on parametric form assumptions. So the causal effect estimates it produces are resistant to functional form misspecification. We demonstrate the effectiveness of the proposed method through simulations. Finally, we apply our method to the North Carolina Honors Program data to evaluate the effect of the honors program on students' performances over multiple years.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2017
Issuance monographic
Language English

Creators/Contributors

Associated with Phạm, Thái
Associated with Stanford University, Graduate School of Business.
Primary advisor Imbens, Guido
Thesis advisor Imbens, Guido
Thesis advisor Bayati, Mohsen
Thesis advisor Hong, Han
Advisor Bayati, Mohsen
Advisor Hong, Han

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Thai Pham.
Note Submitted to the Graduate School of Business.
Thesis Thesis (Ph.D.)--Stanford University, 2017.
Location electronic resource

Access conditions

Copyright
© 2017 by Thai Pham
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...