A lasso for hierarchical interactions

Placeholder Show Content

Abstract/Contents

Abstract
Building predictive interaction models is an important yet challenging problem, especially when the number of variables is large. Statisticians commonly demand that an interaction only be included in a model if both variables are marginally important. We study the problem of identifying hierarchical two-way interaction models from the viewpoint of the Lasso (i.e., L1-penalized regression). We show that we can produce sparse interaction models that honor the hierarchy restriction by adding a set of convex constraints to the Lasso problem. In contrast to stepwise procedures that are most commonly used for building interaction models, our formulation is convex, and so its solution is completely characterized by a set of optimality conditions. This makes it easier to study as a statistical estimator and gives a precise characterization of the effect of the hierarchy restriction. We prove under mild conditions that hierarchy holds with probability one and derive an unbiased estimate for the degrees of freedom of our estimator. A simple bound on this estimate gives a sense of the amount of fitting "saved" by the hierarchy constraint. We distinguish between two types of sparsity: parameter sparsity -- the number of nonzero coefficients in the model -- and practical sparsity -- the number of raw variables one needs to measure to make predictions in the future. While most statistical procedures focus on the former, the restriction to sparse hierarchical interactions gets at the latter, which is the quantity more closely tied to important data collection concerns such as cost, time, and effort. A simulation study reveals the relative statistical merits of the hierarchy assumption in different settings. We demonstrate our method in both linear and logistic regression settings on an HIV-1 drug resistance dataset and a classification problem involving olive oil. Our method has potential applications in genomewide association studies and other situations in which interactions may be important. Finally, we describe an algorithm that forms the basis of the hiernet package that we have created in R.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2012
Issuance monographic
Language English

Creators/Contributors

Associated with Bien, Jacob
Associated with Stanford University, Department of Statistics
Primary advisor Tibshirani, Robert
Thesis advisor Tibshirani, Robert
Thesis advisor Hastie, Trevor
Thesis advisor Taylor, Jonathan E
Advisor Hastie, Trevor
Advisor Taylor, Jonathan E

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Jacob Bien.
Note Submitted to the Department of Statistics.
Thesis Thesis (Ph.D.)--Stanford University, 2012.
Location electronic resource

Access conditions

Copyright
© 2012 by Jacob Bien
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...