Leveraging similarity in statistical learning

Placeholder Show Content


Machine learning literature has provided a toolbox of techniques for general use by the modern applied statistician. For some data analysis, these methods can be improved to yield better results based on the properties of the data in question. In this manuscript, we develop methodology for three different problems in which, in some sense, similarity can be leveraged to make better predictions, as demonstrated in baseball and epidemiology applications. First, we propose adding a penalty on the nuclear norm of the regression coefficient matrix in multinomial regression to learn which outcomes are similar. Second, we propose adding a clustering step to l1 penalized regression, to build customized training sets of observations that are similar to the test set. Third, we propose and evaluate several approaches to mining electronic medical records to support physician decision-making with data on patients similar to a patient in question. This last problem is especially challenging because of the observational and high-dimensional nature of the data.


Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2017
Issuance monographic
Language English


Associated with Powers, Scott Stephen
Associated with Stanford University, Department of Statistics.
Primary advisor Tibshirani, Robert
Thesis advisor Tibshirani, Robert
Thesis advisor Friedman, Jerome
Thesis advisor Hastie, Trevor
Advisor Friedman, Jerome
Advisor Hastie, Trevor


Genre Theses

Bibliographic information

Statement of responsibility Scott Stephen Powers.
Note Submitted to the Department of Statistics.
Thesis Thesis (Ph.D.)--Stanford University, 2017.
Location electronic resource

Access conditions

© 2017 by Scott Stephen Powers
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...