Data exploration with multi-persistent density clustering

Placeholder Show Content

Abstract/Contents

Abstract
This research proposes the multi-persistent density clustering method as a data ex- ploration and profiling tool, which enables users to easily investigate the clustering structure in the data. The main procedure is to construct different versions of neigh- borhood graph from the data regarding to different density level sets with different scale parameters. Then the persistence information of each cluster is summarized into a persistent diagram by tracing the evolution of connected components of those neighborhood graphs, and the information can be used to discover reliable clustering structure. The method extends the goal of scale persistence in most existing persistence- based topological data analysis approaches, so that the results can be robust to the presence of noise and outliers. The method is related to the single linkage hierarchical clustering method and the cluster tree analysis, but it does not suffer from their main shortcomings such as the chaining problem of single linkage. In contrast to most clustering analysis approaches, the method does not require users to provide information about the clustering structure that is usually unknown before running the algorithm such as the number of target clusters. The second part of this dissertation demonstrates four different applications of multi-persistent density clustering, including exploring metastable states in the molec- ular dynamics simulation data, studying treatment variations of breast cancer from electronic health records, discovering disease subtypes from gene expression data and investigating research topics from dissertation titles and abstracts. The demonstra- tions show that the proposed method can be applied to various kinds of data and discover clustering structures with different characteristics.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2013
Issuance monographic
Language English

Creators/Contributors

Associated with Chang, Huang-Wei
Associated with Stanford University, Institute for Computational and Mathematical Engineering.
Primary advisor Carlsson, Gunnar
Thesis advisor Carlsson, Gunnar
Thesis advisor Guibas, Leonidas J
Thesis advisor Pande, Vijay
Advisor Guibas, Leonidas J
Advisor Pande, Vijay

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Huang-Wei Chang.
Note Submitted to the Institute for Computational and Mathematical Engineering.
Thesis Thesis (Ph.D.)--Stanford University, 2013.
Location electronic resource

Access conditions

Copyright
© 2013 by Huang-Wei Chang
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...