Computational methods using large-scale population whole-genome sequencing data

Huang, Lin; Stanford University, Computer Science Department.

Computational methods using large-scale population whole-genome sequencing data

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fhq314wb8008" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Advancements in next-generation low-cost, high-throughput DNA sequencing technologies have made it possible to sequence a large number of human genomes. To date, at least tens of thousands of individuals have been whole genome sequenced. Even more large-scale population sequencing projects are actively underway or will be launched in the foreseeable future. This vast amount of genomic data undoubtedly advances the characterization of human genome variation and supports disease studies across diverse cohorts. However, the challenging problem of how to efficiently and precisely determine individual-level genomic differences from this huge amount of sequencing data exists. This natural first step of leveraging sequencing data for genomic analyses can be computational intensive, while the quality of the restored genomes influences a wide variety of downstream applications, such as association studies, personalized medicine, and population genomics. In this dissertation I present computational methods to approach this fundamental problem in the context of ever-increasing sequencing data volume and demonstrate the effectiveness and efficiency of these methods using real data from latest population sequencing projects. First, I present a new method that maps reads of newly sequenced human genome to a large collection of genomes, aiming to reduce the inherent biases induced by aligning to any single reference genome. Second, I introduce an approach, named Reveel, for single nucleotide variant calling and genotype calling of large cohorts that have been sequenced at a low coverage, that aims for computational efficiency as well as accuracy in capturing linkage disequilibrium patterns present in rare haplotypes. Third, on the basis of the Reveel framework I present a reference-based approach that effectively incorporates genotypes from completed projects to improve the genotyping quality of new datasets while maintaining low computational costs. Finally, I demonstrate an application of genotype information for improving the efficiency of identity-by-descent detection from a large cohort.

Description

Type of resource	text
Form	electronic; electronic resource; remote
Extent	1 online resource.
Publication date	2017
Issuance	monographic
Language	English

Creators/Contributors

Associated with	Huang, Lin
Associated with	Stanford University, Computer Science Department.
Primary advisor	Batzoglou, Serafim
Thesis advisor	Batzoglou, Serafim
Thesis advisor	Kundaje, Anshul, 1980-
Thesis advisor	Pritchard, Jonathan D
Advisor	Kundaje, Anshul, 1980-
Advisor	Pritchard, Jonathan D

Subjects

Genre	Theses

Bibliographic information

Statement of responsibility	Lin Huang.
Note	Submitted to the Department of Computer Science.
Thesis	Thesis (Ph.D.)--Stanford University, 2017.
Location	electronic resource

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...