Computational analysis of the mammalian cis-regulatory landscape

Placeholder Show Content

Abstract/Contents

Abstract
Improvements in DNA sequencing technologies have made it possible to determine the genetic makeup of many organisms. Computational analyses of the massive amounts of sequence data available have produced many insights into evolutionary and developmental biology. For example, comparison of the full genome sequences of human and mouse discovered that the majority of functional sequence in the human genome does not code for protein. Much of this functional non-coding sequence appears to act in a regulatory role, dictating the precise tissues and developmental time points in which each protein should be produced. This dissertation describes three major contributions to the computational analysis of regulatory elements. First, I describe the Genomic Regions Enrichment of Annotations Tool (GREAT), a novel statistical method and associated web-based tool developed to infer the biological functions of regulatory elements based on the functions of their putative target genes. I demonstrate its marked improvement over current methods at interpreting functional enrichment signals for a variety of regulatory element types. Next, I discuss a computational methodology developed to identify medium- to large-scale (10-100,000 nucleotide) genomic deletions from whole genome sequences of multiple mammals. Using this methodology, I quantify the dispensability of highly conserved non-coding elements (CNEs) as their likelihood to be deleted in a subset of species. Despite their genomic prevalence and apparent redundancy in function, CNEs are very rarely lost in extant species. Even more surprisingly, there is a very weak relationship between dispensability and nucleotide conservation level. Sequences under purifying selection at moderate levels of nucleotide conservation are lost at a rate similar to those at perfect sequence conservation. Instead, evolutionary resistance to loss is more strongly correlated with depth of sequence homology, as ancient enhancers are more resistant to deletion than ones that arose more recently in evolution. Finally, I present the discovery and analysis of human-specific genomic deletions. By comparing the genome sequences of five species including human and our nearest ape relative, the chimpanzee, I identified 583 regions present in non-human species that contain highly-conserved sequence but are surprisingly deleted in humans. Statistical analyses indicate that these deletions occur preferentially near steroid hormone receptor genes and brain-expressed genes that are known to inhibit proliferation. Experimental results provide particular examples that may have contributed to unique human traits: the loss of an AR enhancer is correlated with the human loss of penile spines and sensory vibrissae, and the loss of a GADD45G enhancer is correlated with the human expansion of the cerebral cortex.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2011
Issuance monographic
Language English

Creators/Contributors

Associated with McLean, Cory Yuen Fu
Associated with Stanford University, Computer Science Department
Primary advisor Bejerano, Gill, 1970-
Thesis advisor Bejerano, Gill, 1970-
Thesis advisor Batzoglou, Serafim
Thesis advisor Kingsley, David M. (David Mark)
Advisor Batzoglou, Serafim
Advisor Kingsley, David M. (David Mark)

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Cory Yuen Fu McLean.
Note Submitted to the Department of Computer Science.
Thesis Thesis (Ph.D.)--Stanford University, 2011.
Location electronic resource

Access conditions

Copyright
© 2011 by Cory Yuen Fu McLean

Also listed in

Loading usage metrics...