Topics in unsupervised learning : feature selection and multi-modality

Placeholder Show Content

Abstract/Contents

Abstract
Often in the unsupervised setting one clusters data attempting to learn the unobserved latent class variable. Proper inference requires determining both the correct number of clusters and the subset of features dependent on the class variable. In the supervised setting one has prediction error to guide the decision making process. An analog for unsupervised data is prediction strength, Tibshirani and Walther (2005), whereby one attempts to estimate this error by measuring cluster stability. Originally proposed as a method for determining the number of clusters, we will show that prediction strength can also be used for feature selection. Additionally, one can compute the likelihood a feature depends on the latent variable when feature selection is posed as a model selection problem. As the dimensionality of the problem gets large sampling models must be approached with care, motivating a survey of various sampling methods. The second part of the thesis considers low-dimensional projections of the data via principal curves, Hastie and Stuetzle (1989), as a vehicle for determining the number of clusters. In the low-dimensional setting (often a single dimension) multi-modality investigation is simplified resulting in flexible estimation of the actual number of clusters.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Copyright date 2011
Publication date 2010, c2011; 2010
Issuance monographic
Language English

Creators/Contributors

Associated with Ahmed, Murat Omer
Associated with Stanford University, Department of Statistics
Primary advisor Walther, Guenther
Thesis advisor Walther, Guenther
Thesis advisor Lai, T. L
Thesis advisor Tibshirani, Robert
Advisor Lai, T. L
Advisor Tibshirani, Robert

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Murat Ömer Ahmed.
Note Submitted to the Department of Statistics.
Thesis Thesis (Ph.D.)--Stanford University, 2011.
Location electronic resource

Access conditions

Copyright
© 2011 by Murat Omer Ahmed
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...