High-throughput biophysical assays and models to link molecular sequence variation to structural and functional consequences

Placeholder Show Content

Abstract/Contents

Abstract
Nucleic acids and proteins, the fundamental components of biological processes, are built as polymers from relatively small sets of molecular building blocks. Once synthesized, these polymeric chains fold into the 3D conformations necessary to perform their functions. The scientific community has begun to quantify this relationship between the sequence of these macromolecules, their resulting structures, and their effects as manifested through biochemical and biophysical interactions. However, the number of possible sequences grows exponentially with increasing length, rendering the task of fully characterizing their structural and functional diversity impossible. Instead, models of protein function must be built from existing data to predict effects of mutations. Such models have been widely adopted in fields like protein engineering, precision medicine, and genetically-modified organism development. Unfortunately, these models require a lengthy development cycle of data collection, analysis, and testing. Here, I present three new technologies to reduce the cycle time of model development for biophysical processes. The first, described in Chapters 2 and 3, is a novel assay, named BET-seq, for the rapid in vitro characterization of transcription factor-DNA interactions. We use this assay to quantify the binding site context specificities of two yeast transcription factors, Pho4 and Cbf1, to a greater resolution than previously possible. We find that the BET-seq data are sufficient to determine not only raw sequence preferences, but also higher-order epistatic interactions within the transcription factor-DNA complex, allowing for better models of these interactions. Chapter 3 provides a detailed protocol for running the BET-seq assay along with guidelines for assessing data quality and modeling the underlying biophysical process. Chapter 4 introduces an algorithm, named DeCoDe, to design protein-coding DNA libraries to rapidly test pools of hypothetically active protein constructs. DeCoDe uses integer linear programming to select optimal degenerate codons in pooled protein-coding DNA libraries. We show that DeCoDe significantly outperforms existing library design tools and, when used appropriately, can generate cost-effective libraries capable of screening functional protein sequence space. Finally, Chapter 5 presents preliminary efforts to model protein folding as a link-prediction problem using graph neural networks. The goal of this work is to reduce the time necessary to model the structural implications of changes to protein sequences. Taken together, these chapters provide a powerful new set of tools to rapidly design, build, and test systems of biophysical interactions

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2020; ©2020
Publication date 2020; 2020
Issuance monographic
Language English

Creators/Contributors

Author Shimko, Tyler Carter
Degree supervisor Fordyce, Polly
Thesis advisor Fordyce, Polly
Thesis advisor Altman, Russ
Thesis advisor Kundaje, Anshul, 1980-
Thesis advisor Sherlock, Gavin
Degree committee member Altman, Russ
Degree committee member Kundaje, Anshul, 1980-
Degree committee member Sherlock, Gavin
Associated with Stanford University, Department of Genetics.

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Tyler Carter Shimko
Note Submitted to the Department of Genetics
Thesis Thesis Ph.D. Stanford University 2020
Location electronic resource

Access conditions

Copyright
© 2020 by Tyler Carter Shimko
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...