Computational methods using large-scale population whole-genome sequencing data
Abstract/Contents
- Abstract
- Advancements in next-generation low-cost, high-throughput DNA sequencing technologies have made it possible to sequence a large number of human genomes. To date, at least tens of thousands of individuals have been whole genome sequenced. Even more large-scale population sequencing projects are actively underway or will be launched in the foreseeable future. This vast amount of genomic data undoubtedly advances the characterization of human genome variation and supports disease studies across diverse cohorts. However, the challenging problem of how to efficiently and precisely determine individual-level genomic differences from this huge amount of sequencing data exists. This natural first step of leveraging sequencing data for genomic analyses can be computational intensive, while the quality of the restored genomes influences a wide variety of downstream applications, such as association studies, personalized medicine, and population genomics. In this dissertation I present computational methods to approach this fundamental problem in the context of ever-increasing sequencing data volume and demonstrate the effectiveness and efficiency of these methods using real data from latest population sequencing projects. First, I present a new method that maps reads of newly sequenced human genome to a large collection of genomes, aiming to reduce the inherent biases induced by aligning to any single reference genome. Second, I introduce an approach, named Reveel, for single nucleotide variant calling and genotype calling of large cohorts that have been sequenced at a low coverage, that aims for computational efficiency as well as accuracy in capturing linkage disequilibrium patterns present in rare haplotypes. Third, on the basis of the Reveel framework I present a reference-based approach that effectively incorporates genotypes from completed projects to improve the genotyping quality of new datasets while maintaining low computational costs. Finally, I demonstrate an application of genotype information for improving the efficiency of identity-by-descent detection from a large cohort.
Description
Type of resource | text |
---|---|
Form | electronic; electronic resource; remote |
Extent | 1 online resource. |
Publication date | 2017 |
Issuance | monographic |
Language | English |
Creators/Contributors
Associated with | Huang, Lin | |
---|---|---|
Associated with | Stanford University, Computer Science Department. | |
Primary advisor | Batzoglou, Serafim | |
Thesis advisor | Batzoglou, Serafim | |
Thesis advisor | Kundaje, Anshul, 1980- | |
Thesis advisor | Pritchard, Jonathan D | |
Advisor | Kundaje, Anshul, 1980- | |
Advisor | Pritchard, Jonathan D |
Subjects
Genre | Theses |
---|
Bibliographic information
Statement of responsibility | Lin Huang. |
---|---|
Note | Submitted to the Department of Computer Science. |
Thesis | Thesis (Ph.D.)--Stanford University, 2017. |
Location | electronic resource |
Access conditions
- Copyright
- © 2017 by Lin Huang
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...