Leveraging similarity in statistical learning

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Ffb450fm0815" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Machine learning literature has provided a toolbox of techniques for general use by the modern applied statistician. For some data analysis, these methods can be improved to yield better results based on the properties of the data in question. In this manuscript, we develop methodology for three different problems in which, in some sense, similarity can be leveraged to make better predictions, as demonstrated in baseball and epidemiology applications. First, we propose adding a penalty on the nuclear norm of the regression coefficient matrix in multinomial regression to learn which outcomes are similar. Second, we propose adding a clustering step to l1 penalized regression, to build customized training sets of observations that are similar to the test set. Third, we propose and evaluate several approaches to mining electronic medical records to support physician decision-making with data on patients similar to a patient in question. This last problem is especially challenging because of the observational and high-dimensional nature of the data.

Type of resource	text
Form	electronic; electronic resource; remote
Extent	1 online resource.
Publication date	2017
Issuance	monographic
Language	English

Associated with	Powers, Scott Stephen
Associated with	Stanford University, Department of Statistics.
Primary advisor	Tibshirani, Robert
Thesis advisor	Tibshirani, Robert
Thesis advisor	Friedman, Jerome
Thesis advisor	Hastie, Trevor
Advisor	Friedman, Jerome
Advisor	Hastie, Trevor

Genre	Theses

Statement of responsibility	Scott Stephen Powers.
Note	Submitted to the Department of Statistics.
Thesis	Thesis (Ph.D.)--Stanford University, 2017.
Location	electronic resource

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

View in SearchWorks

Loading usage metrics...