Scalable estimation and inference for massive linear mixed models with crossed random effects
Abstract/Contents
- Abstract
- With modern electronic activity, large crossed data sets are increasingly common, with factors such as users and items. It is often appropriate to model them with crossed random effects, since specific levels are temporary. Their size provides challenges for statistical analysis. For such large data sets, the computational costs of estimation and inference (time, space, and communication) should grow at most linearly with the sample size and the algorithms should be parallelizable. Both traditional maximum likelihood estimation and numerous Markov chain Monte Carlo Bayesian algorithms take superlinear time in order to obtain good parameter estimates in the simple two-factor crossed random effects model and linear mixed model with two crossed random effects. We propose moment based, parallelizable algorithms that, with at most linear cost, estimate regression coefficients and variance components and measure the uncertainties of those estimates. These estimates are consistent and asymptotically Gaussian. When run on simulated normally distributed data, our algorithms perform competitively with maximum likelihood methods. We apply the algorithms to some real-world data from Stitch Fix where the crossed random effects correspond to clients and items. The random effects analysis is able to account for the increased variance due to intra-client and intra-item correlations in the data, but ignoring the correlation structure can lead to standard error underestimates of over 10-fold.
Description
Type of resource | text |
---|---|
Form | electronic; electronic resource; remote |
Extent | 1 online resource. |
Publication date | 2017 |
Issuance | monographic |
Language | English |
Creators/Contributors
Associated with | Gao, Katelyn |
---|---|
Associated with | Stanford University, Department of Statistics. |
Primary advisor | Owen, Art B |
Thesis advisor | Owen, Art B |
Thesis advisor | Mackey, Lester |
Thesis advisor | Tibshirani, Robert |
Advisor | Mackey, Lester |
Advisor | Tibshirani, Robert |
Subjects
Genre | Theses |
---|
Bibliographic information
Statement of responsibility | Katelyn Gao. |
---|---|
Note | Submitted to the Department of Statistics. |
Thesis | Thesis (Ph.D.)--Stanford University, 2017. |
Location | electronic resource |
Access conditions
- Copyright
- © 2017 by Katelyn Xiang Gao
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...