Leveraging similarity in statistical learning
- Machine learning literature has provided a toolbox of techniques for general use by the modern applied statistician. For some data analysis, these methods can be improved to yield better results based on the properties of the data in question. In this manuscript, we develop methodology for three different problems in which, in some sense, similarity can be leveraged to make better predictions, as demonstrated in baseball and epidemiology applications. First, we propose adding a penalty on the nuclear norm of the regression coefficient matrix in multinomial regression to learn which outcomes are similar. Second, we propose adding a clustering step to l1 penalized regression, to build customized training sets of observations that are similar to the test set. Third, we propose and evaluate several approaches to mining electronic medical records to support physician decision-making with data on patients similar to a patient in question. This last problem is especially challenging because of the observational and high-dimensional nature of the data.
|Type of resource
|electronic; electronic resource; remote
|1 online resource.
|Powers, Scott Stephen
|Stanford University, Department of Statistics.
|Statement of responsibility
|Scott Stephen Powers.
|Submitted to the Department of Statistics.
|Thesis (Ph.D.)--Stanford University, 2017.
- © 2017 by Scott Stephen Powers
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...