A discriminative modeling approach for inferring ancestry and correcting phase in large population genetics data sets
- Local ancestry inference is an important step in both medical genetics studies and demographic studies. This is because many human populations are the result of admixture, or the interbreeding of distinct ancestral populations. The recent drastic increase in sample sizes and marker densities of population genetic data, particularly from whole-genome sequencing, provides an opportunity for computational methods to harness this data to accurately infer fine-scale local ancestry. However, current approaches to inferring local ancestry can only detect continental-level ancestry accurately and are too computationally complex to handle fully sequenced human genomes. Thus there is a need for methods that can utilize massive population genetics data sets to infer fine-scale ancestry in a computationally rapid and robust manner. In this thesis, I describe my contributions toward this goal. First, I describe a method I developed called RFMix, which uses conditional random fields parameterized by random forests to rapidly train on massive data sets, infer fine-scale local ancestry and correct phase. Second, I evaluate RFMix using simulated and real data sets and compare it to other methods. I also apply RFMix to real data sets to infer demographic histories. Finally, I develop a pipeline for generating reference panels from large databases containing mislabeled and unlabeled samples, and apply it to the massive AncestryDNA genetic database to show that using local ancestry inference as an intermediate analysis step gives better global ancestry estimates than traditional direct approaches.
|Type of resource
|electronic; electronic resource; remote
|1 online resource.
|Maples, Brian K
|Stanford University, Program in Biomedical Informatics.
|Owen, Art B
|Owen, Art B
|Statement of responsibility
|Brian K. Maples.
|Submitted to the Program in Biomedical Informatics.
|Thesis (Ph.D.)--Stanford University, 2014.
- © 2014 by Brian Keith Maples
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...