High-throughput biophysical assays and models to link molecular sequence variation to structural and functional consequences
Abstract/Contents
- Abstract
- Nucleic acids and proteins, the fundamental components of biological processes, are built as polymers from relatively small sets of molecular building blocks. Once synthesized, these polymeric chains fold into the 3D conformations necessary to perform their functions. The scientific community has begun to quantify this relationship between the sequence of these macromolecules, their resulting structures, and their effects as manifested through biochemical and biophysical interactions. However, the number of possible sequences grows exponentially with increasing length, rendering the task of fully characterizing their structural and functional diversity impossible. Instead, models of protein function must be built from existing data to predict effects of mutations. Such models have been widely adopted in fields like protein engineering, precision medicine, and genetically-modified organism development. Unfortunately, these models require a lengthy development cycle of data collection, analysis, and testing. Here, I present three new technologies to reduce the cycle time of model development for biophysical processes. The first, described in Chapters 2 and 3, is a novel assay, named BET-seq, for the rapid in vitro characterization of transcription factor-DNA interactions. We use this assay to quantify the binding site context specificities of two yeast transcription factors, Pho4 and Cbf1, to a greater resolution than previously possible. We find that the BET-seq data are sufficient to determine not only raw sequence preferences, but also higher-order epistatic interactions within the transcription factor-DNA complex, allowing for better models of these interactions. Chapter 3 provides a detailed protocol for running the BET-seq assay along with guidelines for assessing data quality and modeling the underlying biophysical process. Chapter 4 introduces an algorithm, named DeCoDe, to design protein-coding DNA libraries to rapidly test pools of hypothetically active protein constructs. DeCoDe uses integer linear programming to select optimal degenerate codons in pooled protein-coding DNA libraries. We show that DeCoDe significantly outperforms existing library design tools and, when used appropriately, can generate cost-effective libraries capable of screening functional protein sequence space. Finally, Chapter 5 presents preliminary efforts to model protein folding as a link-prediction problem using graph neural networks. The goal of this work is to reduce the time necessary to model the structural implications of changes to protein sequences. Taken together, these chapters provide a powerful new set of tools to rapidly design, build, and test systems of biophysical interactions
Description
Type of resource | text |
---|---|
Form | electronic resource; remote; computer; online resource |
Extent | 1 online resource |
Place | California |
Place | [Stanford, California] |
Publisher | [Stanford University] |
Copyright date | 2020; ©2020 |
Publication date | 2020; 2020 |
Issuance | monographic |
Language | English |
Creators/Contributors
Author | Shimko, Tyler Carter |
---|---|
Degree supervisor | Fordyce, Polly |
Thesis advisor | Fordyce, Polly |
Thesis advisor | Altman, Russ |
Thesis advisor | Kundaje, Anshul, 1980- |
Thesis advisor | Sherlock, Gavin |
Degree committee member | Altman, Russ |
Degree committee member | Kundaje, Anshul, 1980- |
Degree committee member | Sherlock, Gavin |
Associated with | Stanford University, Department of Genetics. |
Subjects
Genre | Theses |
---|---|
Genre | Text |
Bibliographic information
Statement of responsibility | Tyler Carter Shimko |
---|---|
Note | Submitted to the Department of Genetics |
Thesis | Thesis Ph.D. Stanford University 2020 |
Location | electronic resource |
Access conditions
- Copyright
- © 2020 by Tyler Carter Shimko
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...