# A lasso for hierarchical interactions

## Abstract/Contents

- Abstract
- Building predictive interaction models is an important yet challenging problem, especially when the number of variables is large. Statisticians commonly demand that an interaction only be included in a model if both variables are marginally important. We study the problem of identifying hierarchical two-way interaction models from the viewpoint of the Lasso (i.e., L1-penalized regression). We show that we can produce sparse interaction models that honor the hierarchy restriction by adding a set of convex constraints to the Lasso problem. In contrast to stepwise procedures that are most commonly used for building interaction models, our formulation is convex, and so its solution is completely characterized by a set of optimality conditions. This makes it easier to study as a statistical estimator and gives a precise characterization of the effect of the hierarchy restriction. We prove under mild conditions that hierarchy holds with probability one and derive an unbiased estimate for the degrees of freedom of our estimator. A simple bound on this estimate gives a sense of the amount of fitting "saved" by the hierarchy constraint. We distinguish between two types of sparsity: parameter sparsity -- the number of nonzero coefficients in the model -- and practical sparsity -- the number of raw variables one needs to measure to make predictions in the future. While most statistical procedures focus on the former, the restriction to sparse hierarchical interactions gets at the latter, which is the quantity more closely tied to important data collection concerns such as cost, time, and effort. A simulation study reveals the relative statistical merits of the hierarchy assumption in different settings. We demonstrate our method in both linear and logistic regression settings on an HIV-1 drug resistance dataset and a classification problem involving olive oil. Our method has potential applications in genomewide association studies and other situations in which interactions may be important. Finally, we describe an algorithm that forms the basis of the hiernet package that we have created in R.

## Description

Type of resource | text |
---|---|

Form | electronic; electronic resource; remote |

Extent | 1 online resource. |

Publication date | 2012 |

Issuance | monographic |

Language | English |

## Creators/Contributors

Associated with | Bien, Jacob | |
---|---|---|

Associated with | Stanford University, Department of Statistics | |

Primary advisor | Tibshirani, Robert | |

Thesis advisor | Tibshirani, Robert | |

Thesis advisor | Hastie, Trevor | |

Thesis advisor | Taylor, Jonathan E | |

Advisor | Hastie, Trevor | |

Advisor | Taylor, Jonathan E |

## Subjects

Genre | Theses |
---|

## Bibliographic information

Statement of responsibility | Jacob Bien. |
---|---|

Note | Submitted to the Department of Statistics. |

Thesis | Thesis (Ph.D.)--Stanford University, 2012. |

Location | electronic resource |

## Access conditions

- Copyright
- © 2012 by Jacob Bien
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

## Also listed in

Loading usage metrics...