Scalable estimation and inference for massive linear mixed models with crossed random effects

Placeholder Show Content

Abstract/Contents

Abstract
With modern electronic activity, large crossed data sets are increasingly common, with factors such as users and items. It is often appropriate to model them with crossed random effects, since specific levels are temporary. Their size provides challenges for statistical analysis. For such large data sets, the computational costs of estimation and inference (time, space, and communication) should grow at most linearly with the sample size and the algorithms should be parallelizable. Both traditional maximum likelihood estimation and numerous Markov chain Monte Carlo Bayesian algorithms take superlinear time in order to obtain good parameter estimates in the simple two-factor crossed random effects model and linear mixed model with two crossed random effects. We propose moment based, parallelizable algorithms that, with at most linear cost, estimate regression coefficients and variance components and measure the uncertainties of those estimates. These estimates are consistent and asymptotically Gaussian. When run on simulated normally distributed data, our algorithms perform competitively with maximum likelihood methods. We apply the algorithms to some real-world data from Stitch Fix where the crossed random effects correspond to clients and items. The random effects analysis is able to account for the increased variance due to intra-client and intra-item correlations in the data, but ignoring the correlation structure can lead to standard error underestimates of over 10-fold.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2017
Issuance monographic
Language English

Creators/Contributors

Associated with Gao, Katelyn
Associated with Stanford University, Department of Statistics.
Primary advisor Owen, Art B
Thesis advisor Owen, Art B
Thesis advisor Mackey, Lester
Thesis advisor Tibshirani, Robert
Advisor Mackey, Lester
Advisor Tibshirani, Robert

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Katelyn Gao.
Note Submitted to the Department of Statistics.
Thesis Thesis (Ph.D.)--Stanford University, 2017.
Location electronic resource

Access conditions

Copyright
© 2017 by Katelyn Xiang Gao
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...