Factorization methods for inferring structure in ensemble average measurements of RNA molecules and networks

Placeholder Show Content

Abstract/Contents

Abstract
Ribonucleic acids (RNA) are one of life's fundamental biopolymers with the capacity to store genetic information, perform catalysis, and alter their structures and behavior in response to environmental stimuli and have been found central to critical biological processes ranging from protein translation to gene regulation. Measuring the quantities and structures of these molecules is therefore an important gateway to understanding the mechanisms that govern cell behavior. Many of the current high-throughput methodologies used to interrogate the properties of RNA molecules come in the form of ensemble average measurements: weighted projections of the multiple biological or physical states of the RNAs being probed. Inferring the properties of and/or relationships between these "hidden" states from ensemble averages gives important insights into the mechanisms and functions of RNAs in several scenarios. For example, ensemble average measurements of the structural properties of an RNA sequence in solution gives a weighted projection of the properties of all the structures that the molecules adopt in equilibrium, many of which may be functionally relevant. In another scenario we might be interested in measuring the expression levels of several genes in a pool of cells through mRNA quantification: the resulting ensemble averages will be a projection of the transcriptional programs of the cells in multiple biological states. In this dissertation, I develop methods to factorize and summarize state-wise properties of these ensemble averages. First, I present methods to infer properties of the secondary structure ensembles of RNAs using classic and readily obtainable chemical mapping measurements. To this end, I interrogate statistical relationships between structural properties and chemical mapping measurements by creating and mining the RNA Mapping Database (RMDB), the most diverse repository of RNA structural data to date; by developing the RNA Ensemble Extraction From Footprinting Insights Technique (REEFFIT), a secondary structure landscape modeling framework that uses perturbed, multi-dimensional chemical mapping data; and by developing chemical mapping data statistics for characterizing the entropy of an RNA's structural ensemble. These methods help reveal the rich folding landscapes of non-coding RNAs, reconcile two models for the structures of the human accelerated region 1A (HAR1A) chimp and human genes, and establish the properties of the structural landscapes of random RNA sequences that serve as a useful "biological null" when characterizing novel and putatively-functional RNAs. In the second part of this dissertation I focus on examining ensemble averages of gene expression measurements by developing the Community LASSO with Intersecting Priors (CLIP), a method to jointly infer gene co-expression relationships and gene communities from gene expression values under multi-factorial perturbation schemes; and by introducing the notion of "gene roles" defined by their network and community topologies. I use CLIP and gene roles to build a transcriptional map of human heart failure using more than 300 failing and healthy human left-ventricular heart samples and pinpoint central genes and networks that appear to be involved in this devastating condition. One of these central genes, the regulatory subunit of protein phosphatase I PPP1R3A, seems to play different gene roles under healthy and pathological conditions. I validate these inferences using whole-transcriptome timed "snapshots" of an in vitro model of hypertrophy, verifying significant cross-gene expression time dependencies in half of the gene communities inferred by CLIP, demonstrating that PPP1R3A ameliorates hypertrophy in vitro, and suggesting causal routes that this gene may take to alter hypertrophy and heart failure phenotypes.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2015
Issuance monographic
Language English

Creators/Contributors

Associated with Sánchez Cordero González, Sergio Pablo
Associated with Stanford University, Department of Biomedical Informatics.
Primary advisor Ashley, Euan A
Primary advisor Das, Rhiju
Thesis advisor Ashley, Euan A
Thesis advisor Das, Rhiju
Thesis advisor Altman, Russ
Advisor Altman, Russ

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Sergio Pablo Sánchez Cordero González.
Note Submitted to the Program in Biomedical Informatics.
Thesis Thesis (Ph.D.)--Stanford University, 2015.
Location electronic resource

Access conditions

Copyright
© 2015 by Sergio Pablo Sanchez Cordero Gonzalez
License
This work is licensed under a Creative Commons Attribution 3.0 Unported license (CC BY).

Also listed in

Loading usage metrics...