Essays on the econometrics of causal inference, resampling and spatial dependence
- This thesis is a collection of four papers corresponding to all the research in econometrics that I have done during my graduate studies at Stanford. The first and second chapters study causal inference in regression discontinuity designs. In recent years, numerous studies have employed regression discontinuity designs with many cutoffs assigning individuals to heterogeneous treatments. A common practice is to normalize all of the cutoffs to zero and estimate only one effect. This procedure identifies the average of local treatment effects weighted by the observed relative density of individuals at the existing cutoffs. However, researchers often want to make inferences on more meaningful average treatment effects (ATE) computed over general counterfactual distributions of individuals rather than simply the observed distribution of individuals local to existing cutoffs. In the first chapter, we propose a root-n consistent and asymptotically normal estimator for such ATEs when heterogeneity follows a non-parametric smooth function of cutoff characteristics. In the case of parametric heterogeneity, observations are optimally combined to minimize the mean squared error of the ATE estimator. Inference results are also provided for the fuzzy regression discontinuity case, where the parametric heterogeneity assumption yields identification of treatment effects on individuals who comply with at least one of the multiple treatments. In the second chapter, we focus on Fuzzy Regression Discontinuity (FRD) designs with one cutoff. Many empirical studies use FRD designs to identify treatment effects when the receipt of treatment is potentially correlated to outcomes. Existing FRD methods identify the local average treatment effect (LATE) on the subpopulation of compliers with values of the forcing variable that are equal to the threshold. In the second chapter, we develop methods that assess the plausibility of generalizing LATE to subpopulations other than compliers, and to subpopulations other than those with forcing variable equal to the threshold. Specifically, we focus on testing the equality of the distributions of potential outcomes for treated compliers and always-takers, and for non-treated compliers and never-takers. We show that equality of these pairs of distributions implies that the expected outcome conditional on the forcing variable and the treatment status is continuous in the forcing variable at the threshold, for each of the two treatment regimes. As a matter of routine, we recommend that researchers present graphs with estimates of these two conditional expectations in addition to graphs with estimates of the expected outcome conditional on the forcing variable alone. We illustrate our methods using data on the academic performance of students attending the summer school program in two large school districts in the US. In the third chapter, we propose a fast resample method for two step nonlinear parametric and semiparametric models. Our resample method is faster than standard methods because it does not require recomputation of the second stage estimator during each resample iteration. The fast resample method directly exploits the score function representations computed on each bootstrap sample, thereby reducing computational time considerably. This method is used to approximate the limit distribution of parametric and semiparametric estimators, possibly simulation based, that admit an asymptotic linear representation. Monte Carlo experiments demonstrate the desirable performance and vast improvement in the numerical speed of the fast bootstrap method. Finally, the fourth chapter studies the effects of spatially correlated data on count data regressions. Count data regressions are an important tool for empirical analyses ranging from analyses of patent counts to measures of health and unemployment. Along with negative binomial, Poisson panel regressions are a preferred method of analysis because the Poisson conditional fixed effects maximum likelihood estimator (PCFE) and its sandwich variance estimator are consistent even if the data are not Poisson-distributed, or if the data are correlated over time. Analyses of counts may however also be affected by correlation in the cross-section. For example, patent counts or publications may increase across related research fields in response to common shocks. The fourth chapter shows that the PCFE and its sandwich variance estimator are consistent in the presence of such dependence in the cross-section - as long as spatial dependence is time-invariant. We develop a test for time-invariant spatial dependence and provide code in STATA and MATLAB to implement the test.
|Type of resource
|electronic; electronic resource; remote
|1 online resource.
|Bertanha, Marinho Angelo
|Stanford University, Department of Economics.
|Hoxby, Caroline Minter
|Hoxby, Caroline Minter
|Statement of responsibility
|Marinho Angelo Bertanha.
|Submitted to the Department of Economics.
|Thesis (Ph.D.)--Stanford University, 2015.
- © 2015 by Marinho Angelo Bertanha
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...