Unsupervised learning across multiple datasets
Abstract/Contents
- Abstract
- Subtypes define distinctive subgroups of objects found within a larger cohort; these subtypes can help domain experts define actionable recommendations for each subgroup to improve outcomes. With the relatively recent explosion of large datasets accompanied by large numbers of features, a popular way to define subtypes is unsupervised learning, or clustering, algorithms. Unfortunately, unsupervised learning algorithms have a serious drawback: there is no ground truth. While a set of clusters may correlate strongly with an outcomes variable, an outcomes, or response, variable, is not used in an unsupervised learning algorithm; this means that the accuracy of clusters derived from such algorithms, by nature, cannot be quantified. One way to ensure subtypes represent true signal is to conduct the clustering analysis on multiple datasets. However, there is a lack of methods for unsupervised learning across multiple datasets. In this dissertation, I propose novel methods for unsupervised clustering across multiple datasets, by finding a consensus across clusters derived from each individual dataset. I propose an algorithm, COINCIDE, that encompasses these novel methods; COINCIDE interprets each cluster as a node in a network. I apply COINCIDE to cancer gene expression and pathology datasets, and finally sepsis gene expression datasets, to illustrate the ability of COINCIDE to conduct unsupervised learning across multiple datasets to discover robust subtypes.
Description
Type of resource | text |
---|---|
Form | electronic; electronic resource; remote |
Extent | 1 online resource. |
Publication date | 2015 |
Issuance | monographic |
Language | English |
Creators/Contributors
Associated with | Planey, Katie | |
---|---|---|
Associated with | Stanford University, Program in Biomedical Informatics. | |
Primary advisor | Gevaert, Olivier Michel Simonne | |
Thesis advisor | Gevaert, Olivier Michel Simonne | |
Thesis advisor | Musen, Mark A | |
Thesis advisor | Salzman, Julia | |
Advisor | Musen, Mark A | |
Advisor | Salzman, Julia |
Subjects
Genre | Theses |
---|
Bibliographic information
Statement of responsibility | Katie Planey. |
---|---|
Note | Submitted to the Program in Biomedical Informatics. |
Thesis | Thesis (Ph.D.)--Stanford University, 2015. |
Location | electronic resource |
Access conditions
- Copyright
- © 2015 by Catherine RoseMary Planey
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...