Bayesian assembly of reads from high throughput sequencing

Placeholder Show Content

Abstract/Contents

Abstract
The high-throughput sequencing revolution allows us to take millions of noisy short reads from the DNA in a sample, essentially taking a snapshot of the genomic material in the sample. To recover the true genomes, these reads are assembled by algorithms exploiting their high coverage and overlap. I focus on two scenarios for sequence assembly. The first is de novo assembly, where the reads come from an unknown and diverse population of genomes. The second is variant assembly, where the reads come from short but clonally related genomes, only slightly mutated from each other. In both cases I use the same principled Bayesian approach to design an algorithm that uncovers the composition of the genomic sequences that produced the reads. I will demonstrate the algorithms' performance on real data taken from various metagenomic environments, as well as the immune system B cells. On that latter dataset, collected from 10 organ donors each providing 4 tissue samples, the results show evidence of clone migration between tissues and provide new insights on the organization of the immune system.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2012
Issuance monographic
Language English

Creators/Contributors

Associated with Laserson, Jonathan Daniel
Associated with Stanford University, Computer Science Department
Primary advisor Koller, Daphne
Thesis advisor Koller, Daphne
Thesis advisor Batzoglou, Serafim
Thesis advisor Fire, Andrew Zachary
Advisor Batzoglou, Serafim
Advisor Fire, Andrew Zachary

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Jonathan Laserson.
Note Submitted to the Department of Computer Science.
Thesis Thesis (Ph.D.)--Stanford University, 2012.
Location electronic resource

Access conditions

Copyright
© 2012 by Jonathan Daniel Laserson
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...