Learning human history from sequenced Y chromosomes
- The Y chromosome harbors the longest stretch of non-recombining DNA in the human genome--by orders of magnitude. Consequently, the molecule a man transmits to his son bears a record of mutations that occurred in their paternal-line ancestors. We can therefore utilize variation within a sample to build a detailed phylogenetic tree that is a rich source of information about historical migrations and extant population structure. Two decades of scholarship have revealed many aspects of the tree's topology and the geographic distribution of its clades, but little was known about its branch lengths. The advent of high-throughput sequencing unlocked the chromosome's full potential as an evolutionary marker. With full sequences, we could, in principle, discover variants free of ascertainment bias and with great sensitivity, but the Y chromosome's uniquely complex structure presented challenges for short-read sequencing analysis. In this dissertation, I describe my work developing methods to analyze, interpret, and extract insight from Y-chromosome sequences. Upon segmenting the chromosome to delineate regions amenable to short-read sequencing and developing a pipeline to reliably call genotypes, I characterized the full structure of the tree and used its branch lengths to estimate split times. I first applied these methods to a collection of 69 individuals sampled from nine globally diverse populations and to a study of the phylogenetic and geographic structure of a common yet poorly characterized clade. Second, I extended these methods to gain insight from ancient-DNA (aDNA) specimens. To estimate a split time related to the initial colonization of the Americas, I utilized missing evolution on the lineage of a 12,600-year-old individual buried in direct association with Clovis tools, implementing a Poisson process model for mutations on the tree. I also analyzed the Y-chromosome sequence of Kennewick Man, a 9,000-year-old individual whose population affinities had been the subject of scientific debate and legal controversy. Finally, I used Y-chromosome sequence data to help identify the likely origin of a 17th-century enslaved African whose remains were excavated in the Caribbean, and I developed a method to leverage the Y-chromosome phylogeny to estimate the genotyping error rate. I then scaled the methods I had developed in order to apply them to two large-scale sequencing projects. In the third part of this dissertation, I detail my analysis for the Y-chromosome subgroup of the 1000 Genomes Project, whose sample includes 1,244 males from 26 populations. To conclude, I discuss an ongoing effort to investigate the population history of Africa by capturing and sequencing the Y chromosomes of several hundred Africans sampled from diverse populations across the continent.
|Type of resource
|electronic; electronic resource; remote
|1 online resource.
|Poznik, G. David
|Stanford University, Program in Biomedical Informatics.
|Pritchard, Jonathan D
|Pritchard, Jonathan D
|Statement of responsibility
|G. David Poznik.
|Submitted to the Program in Biomedical Informatics.
|Thesis (Ph.D.)--Stanford University, 2015.
- © 2015 by Gabriel David Poznik
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...