Statistical inference of mutational mechanisms from human genetic variation patterns

Placeholder Show Content

Abstract/Contents

Abstract
Germline mutations cause a variety of debilitating genetic diseases, but also fuel evolution by acting as sources of genotypic novelty. Advances in DNA sequencing technology have made it possible, in some cases, to tease apart the molecular processes that cause mutations. At the same time, human population genetics is beginning to both contribute to nd greatly benefit from our growing understanding of mutational processes and their consequences. In this thesis, I describe models and analyses of mutational mechanisms and their consequences for patterns of genetic variation in humans. In the first chapter, I describe work a study of gene conversion---the copying of genetic sequence from a "donor" region to an "acceptor"---between tandem gene duplicates. In nonallelic gene conversion (NAGC), the donor and the acceptor are at distinct genetic loci. Despite the role of NAGC in various genetic diseases and its implications for the concerted evolution of gene families, the rates and contributing factors of NAGC are not well-characterized. Here, we survey duplicate gene families across primates and identify converted regions in 46% of duplicate gene families surveyed. These conversions reflect a large GC bias of NAGC. We further estimate the parameters governing NAGC in humans: a mean NAGC tract length of 250bp and a rate that is an order of magnitude higher than point mutations (a probability of 2.5*10^-7 per generation for a nucleotide to be converted). Despite this seemingly high rate, we show that NAGC likely has only a small average effect on the sequence divergence of duplicates. This work improves our understanding of the mechanisms behind NAGC and of the role it plays in the evolution of gene duplicates. In the second part, I describe an analysis of the determinants of the distribution of allele frequency (otherwise known as the site frequency spectrum, or SFS) in humans. The SFS has long been used to study demographic history and natural selection. Here, we extend this summary by examining the SFS conditional on the alleles found at the same site in other species. We refer to this extension as the "phylogenetically-conditioned SFS" or cSFS. Using recent large-sample data from the Exome Aggregation Consortium (ExAC), combined with primate genome sequences, we find that human variants that occurred independently in closely related primate lineages are at higher frequencies in humans than variants with parallel substitutions in more distant primates. We show that this effect is largely due to sites with elevated mutation rates causing significant departures from the widely-used infinite sites mutation model. Our analysis also suggests substantial variation in mutation rates even among mutations involving the same nucleotide changes. We additionally find evidence for epistatic effects on the cSFS; namely, that parallel primate substitutions are more informative about constraint in humans when the local sequence context is similar than when there are other nearby substitutions. In summary, we show that variable mutation rates and local epistatic effects are important determinants of the SFS in humans.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2018; ©2018
Publication date 2018; 2018
Issuance monographic
Language English

Creators/Contributors

Author Harpak, Arbel
Degree supervisor Pritchard, Jonathan D
Thesis advisor Pritchard, Jonathan D
Thesis advisor Fraser, Hunter B
Thesis advisor Rosenberg, Noah
Degree committee member Fraser, Hunter B
Degree committee member Rosenberg, Noah
Associated with Stanford University, Department of Biology.

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Arbel Harpak.
Note Submitted to the Department of Biology.
Thesis Thesis Ph.D. Stanford University 2018.
Location electronic resource

Access conditions

Copyright
© 2018 by Arbel Harpak
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...