Statistical models for phenotypic and genotypic expression

Hussami, Nadine; Stanford University, Department of Electrical Engineering.

Statistical models for phenotypic and genotypic expression

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fpv232sz8124" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Over the last decades, genomic data has become significantly cheaper to produce and more ubiquitous. The analysis of such data can shed light on the functioning of organisms and how phenotypes are encoded in the genetic material of each cell. It is thus of interest to create informative statistical models of cellular phenotypes, such as gene expression, using this genomic data. In this work, we develop models to understand how gene expression is regulated by the genetic code and key regulatory proteins, and how these regulatory programs give rise to diverse phenotypes. The first part of the thesis focuses on inferring phenotypic traits directly from genotypic variants. To this end we introduce a new sparse regression method called the component lasso. The method is suited for datasets with highly correlated groups of variables, which often occur in genetics. In particular, we consider predicting traits from correlated sets of mutations in genes. The method estimates and uses the connected-components structure of the sample covariance matrix during inference to achieve a lower mean squared error as well as better support recovery. We evaluate the performance of the component lasso on simulated and real data examples. In the second part of the thesis, the focus is on the problem of genotypic expression and methods for modeling gene regulatory networks in different cells. We assume a simplified model in which mechanisms responsible for gene expression involve only two main elements: 1) transcription factors that bind to the DNA molecule; and 2) motifs that exist in the regulatory regions of genes. We first present a solution within a boosting framework that represents a regulatory network with alternating decision trees. We use the cell differentiation hierarchy to infer different networks for different cell types, while restricting the differences for models of closely related cells. We evaluate the boosting method on simulated data as well as on a real hematopoiesis dataset that has an inherent hierarchy over blood cells that stem from a single progenitor. We then present a deep learning approach for the classification of gene expression. We use a multimodal neural network on the raw DNA sequence and regulator expression data, which allows us to automatically discover relevant and new motif sequences.

Description

Type of resource	text
Form	electronic; electronic resource; remote
Extent	1 online resource.
Publication date	2017
Issuance	monographic
Language	English

Creators/Contributors

Associated with	Hussami, Nadine
Associated with	Stanford University, Department of Electrical Engineering.
Primary advisor	Kundaje, Anshul, 1980-
Primary advisor	Tibshirani, Robert
Thesis advisor	Kundaje, Anshul, 1980-
Thesis advisor	Tibshirani, Robert
Thesis advisor	Duchi, John
Thesis advisor	Weissman, Tsachy
Advisor	Duchi, John
Advisor	Weissman, Tsachy

Subjects

Genre	Theses

Bibliographic information

Statement of responsibility	Nadine Hussami.
Note	Submitted to the Department of Electrical Engineering.
Thesis	Thesis (Ph.D.)--Stanford University, 2017.
Location	electronic resource

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...