A discriminative modeling approach for inferring ancestry and correcting phase in large population genetics data sets

Placeholder Show Content

Abstract/Contents

Abstract
Local ancestry inference is an important step in both medical genetics studies and demographic studies. This is because many human populations are the result of admixture, or the interbreeding of distinct ancestral populations. The recent drastic increase in sample sizes and marker densities of population genetic data, particularly from whole-genome sequencing, provides an opportunity for computational methods to harness this data to accurately infer fine-scale local ancestry. However, current approaches to inferring local ancestry can only detect continental-level ancestry accurately and are too computationally complex to handle fully sequenced human genomes. Thus there is a need for methods that can utilize massive population genetics data sets to infer fine-scale ancestry in a computationally rapid and robust manner. In this thesis, I describe my contributions toward this goal. First, I describe a method I developed called RFMix, which uses conditional random fields parameterized by random forests to rapidly train on massive data sets, infer fine-scale local ancestry and correct phase. Second, I evaluate RFMix using simulated and real data sets and compare it to other methods. I also apply RFMix to real data sets to infer demographic histories. Finally, I develop a pipeline for generating reference panels from large databases containing mislabeled and unlabeled samples, and apply it to the massive AncestryDNA genetic database to show that using local ancestry inference as an intermediate analysis step gives better global ancestry estimates than traditional direct approaches.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2014
Issuance monographic
Language English

Creators/Contributors

Associated with Maples, Brian K
Associated with Stanford University, Program in Biomedical Informatics.
Primary advisor Bustamante, Carlos
Thesis advisor Bustamante, Carlos
Thesis advisor Altman, Russ
Thesis advisor Owen, Art B
Advisor Altman, Russ
Advisor Owen, Art B

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Brian K. Maples.
Note Submitted to the Program in Biomedical Informatics.
Thesis Thesis (Ph.D.)--Stanford University, 2014.
Location electronic resource

Access conditions

Copyright
© 2014 by Brian Keith Maples
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...