Single-molecule whole genome reconstruction and haplotype phasing

Placeholder Show Content

Abstract/Contents

Abstract
Recent advances in DNA sequencing technologies improved the scale and cost of reading DNA by multiple orders of magnitude enabling new ways to study evolution, genetic variation and genome-related diseases. This dissertation focuses on developing experimental and computational techniques to address clinically and scientifically relevant applications of high throughput DNA sequencing. Firstly, we started by examining single molecule DNA sequencing by synthesis (SBS) -- a method that allows reading DNA sequence by incorporating fluorescently labeled nucleotides into short strands of DNA attached to a flow cell surface. We demonstrated the feasibility of measuring whole human genome variation and accurate detection of copy number changes. We further applied this technique to study clonal evolution in acute myeloid leukemia and found that the CD34+ 38- leukemia stem cells and CD34- blasts have a genome that is clearly distinct from the normal cells, yet no significant differences are observed between the blast and LSC genomes. However, current technologies are insufficient in resolving complex genome structure to produce consistent genome assembly similar to reference genome assemblies typically obtained from sequencing fosmid or BAC libraries. To address the shortcomings associated with assembling short read sequences from complex genomes, we described a novel experimental method of molecular barcoding that allows accurate sequencing of long contiguous strands of DNA and its applications to assembling repeat-rich genomes for which a reference is not available. We applied this method to assemble the Botryllus schlosseri genome to study the genetics of histocompatibility. We further presented a computational method, the statistically aided long-read haplotyping (SLRH), that extends this technique to measure haplotype information of the human genome to provide a high resolution haplotype map of the NA12878 trio. Since SLRH requires a substantial amount of input material up to 1microgram of HMW DNA which may not be always possible to obtain, we presented another experimental approach to human genome haplotyping based on contiguity preserving transposition with the Tn5 transposase and combinatorial indexing. We demonstrated accurate megabase scale human genome haplotyping using less than 1 nanogram of input DNA material. We introduced a simple, fast and reliable approach for genome-wide, haplotyping based on sequencing of nearly 10,000 "virtual partitions" generated by contiguity preserving transposition and combinatorial indexing. With as little as 40 - 60 Gb of sequencing data ~94 - 96% of SNPs in individual human genomes can be phased. Haplotyping blocks are in the megabase range with long switch error rate on the order of 1 - 2 per 10 Mb assuring accurate phasing across the genome. The method uses readily available equipment, requires minimal hands-on time, and can potentially be integrated with microfluidic, droplet and flow cell platforms.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2015
Issuance monographic
Language English

Creators/Contributors

Associated with Pushkarev, Dmitry
Associated with Stanford University, Department of Physics.
Primary advisor Quake, Stephen Ronald
Thesis advisor Quake, Stephen Ronald
Thesis advisor Doniach, S
Thesis advisor Greenleaf, William James
Advisor Doniach, S
Advisor Greenleaf, William James

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Dmitry Pushkarev.
Note Submitted to the Department of Physics.
Thesis Thesis (Ph.D.)--Stanford University, 2015.
Location electronic resource

Access conditions

Copyright
© 2015 by Dmitry Pushkarev

Also listed in

Loading usage metrics...