Discovery and visualization of latent structure with applications to the microbiome

Placeholder Show Content

Abstract/Contents

Abstract
Human microbiomes -- the collections of bacteria living around and within the human body -- are complex ecological systems, and describing their structure and function in different contexts is important from both basic scientific and medical perspectives. Viewed through a statistical lens, many microbiome analyses framed in terms of discovering and describing latent structure. For example, this structure might reflect sudden environmental shocks that affect certain subsets of species, or may illuminate gradual shifts in community composition. In this thesis, we survey and develop ideas from the data visualization and probabilistic modeling literatures that we have found useful in identifying and characterizing such structure in the microbiome. On the data visualization front, we describe the focus-plus-context and linking principles, and describe new R packages that use these ideas to facilitate visualization of hierarchical collections of time series. These tools streamline the navigation of complex data, guiding researchers towards plausible statistical models. We then turn our attention to modeling, motivated by the fact that microbiome species abundance data often have effectively low-dimensional evolutionary, temporal, and count structure. We then characterize and review methods appropriate for three classes of common microbiome data analysis problems -- dimensionality reduction, multitable integration, and regime detection. For dimensionality reduction, we explore basic probabilistic latent variable models, focusing on mixed-membership and matrix factorization techniques. For multitable integration, we contrast nonparametric ordination, structured regularization, and probabilistic modeling approaches. For regime detection, we compare variants of hidden markov, dynamical systems, and changepoint models, along with baselines that don't take into account time structure. Throughout, we illustrate visualization and modeling techniques using real human gut microbiome data. Code and data for all experiments are available publicly online.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2018
Issuance monographic
Language English

Creators/Contributors

Associated with Sankaran, Kris
Associated with Stanford University, Department of Statistics.
Primary advisor Holmes, Susan, 1954-
Thesis advisor Holmes, Susan, 1954-
Thesis advisor Efron, Bradley
Thesis advisor Switzer, Paul
Advisor Efron, Bradley
Advisor Switzer, Paul

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Kris Sankaran.
Note Submitted to the Department of Statistics.
Thesis Thesis (Ph.D.)--Stanford University, 2018.
Location electronic resource

Access conditions

Copyright
© 2018 by Kris Sankaran
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...