Unsupervised learning across multiple datasets

Placeholder Show Content

Abstract/Contents

Abstract
Subtypes define distinctive subgroups of objects found within a larger cohort; these subtypes can help domain experts define actionable recommendations for each subgroup to improve outcomes. With the relatively recent explosion of large datasets accompanied by large numbers of features, a popular way to define subtypes is unsupervised learning, or clustering, algorithms. Unfortunately, unsupervised learning algorithms have a serious drawback: there is no ground truth. While a set of clusters may correlate strongly with an outcomes variable, an outcomes, or response, variable, is not used in an unsupervised learning algorithm; this means that the accuracy of clusters derived from such algorithms, by nature, cannot be quantified. One way to ensure subtypes represent true signal is to conduct the clustering analysis on multiple datasets. However, there is a lack of methods for unsupervised learning across multiple datasets. In this dissertation, I propose novel methods for unsupervised clustering across multiple datasets, by finding a consensus across clusters derived from each individual dataset. I propose an algorithm, COINCIDE, that encompasses these novel methods; COINCIDE interprets each cluster as a node in a network. I apply COINCIDE to cancer gene expression and pathology datasets, and finally sepsis gene expression datasets, to illustrate the ability of COINCIDE to conduct unsupervised learning across multiple datasets to discover robust subtypes.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2015
Issuance monographic
Language English

Creators/Contributors

Associated with Planey, Katie
Associated with Stanford University, Program in Biomedical Informatics.
Primary advisor Gevaert, Olivier Michel Simonne
Thesis advisor Gevaert, Olivier Michel Simonne
Thesis advisor Musen, Mark A
Thesis advisor Salzman, Julia
Advisor Musen, Mark A
Advisor Salzman, Julia

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Katie Planey.
Note Submitted to the Program in Biomedical Informatics.
Thesis Thesis (Ph.D.)--Stanford University, 2015.
Location electronic resource

Access conditions

Copyright
© 2015 by Catherine RoseMary Planey
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...