Topics in unsupervised learning : feature selection and multi-modality

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Ftw896xr8485" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Often in the unsupervised setting one clusters data attempting to learn the unobserved latent class variable. Proper inference requires determining both the correct number of clusters and the subset of features dependent on the class variable. In the supervised setting one has prediction error to guide the decision making process. An analog for unsupervised data is prediction strength, Tibshirani and Walther (2005), whereby one attempts to estimate this error by measuring cluster stability. Originally proposed as a method for determining the number of clusters, we will show that prediction strength can also be used for feature selection. Additionally, one can compute the likelihood a feature depends on the latent variable when feature selection is posed as a model selection problem. As the dimensionality of the problem gets large sampling models must be approached with care, motivating a survey of various sampling methods. The second part of the thesis considers low-dimensional projections of the data via principal curves, Hastie and Stuetzle (1989), as a vehicle for determining the number of clusters. In the low-dimensional setting (often a single dimension) multi-modality investigation is simplified resulting in flexible estimation of the actual number of clusters.

Type of resource	text
Form	electronic; electronic resource; remote
Extent	1 online resource.
Copyright date	2011
Publication date	2010, c2011; 2010
Issuance	monographic
Language	English

Associated with	Ahmed, Murat Omer
Associated with	Stanford University, Department of Statistics
Primary advisor	Walther, Guenther
Thesis advisor	Walther, Guenther
Thesis advisor	Lai, T. L
Thesis advisor	Tibshirani, Robert
Advisor	Lai, T. L
Advisor	Tibshirani, Robert

Genre	Theses

Statement of responsibility	Murat Ömer Ahmed.
Note	Submitted to the Department of Statistics.
Thesis	Thesis (Ph.D.)--Stanford University, 2011.
Location	electronic resource

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

View in SearchWorks

Loading usage metrics...