High-throughput biophysical assays and models to link molecular sequence variation to structural and functional consequences

Shimko, Tyler Carter

High-throughput biophysical assays and models to link molecular sequence variation to structural and functional consequences

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Ftn125qx4948" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Nucleic acids and proteins, the fundamental components of biological processes, are built as polymers from relatively small sets of molecular building blocks. Once synthesized, these polymeric chains fold into the 3D conformations necessary to perform their functions. The scientific community has begun to quantify this relationship between the sequence of these macromolecules, their resulting structures, and their effects as manifested through biochemical and biophysical interactions. However, the number of possible sequences grows exponentially with increasing length, rendering the task of fully characterizing their structural and functional diversity impossible. Instead, models of protein function must be built from existing data to predict effects of mutations. Such models have been widely adopted in fields like protein engineering, precision medicine, and genetically-modified organism development. Unfortunately, these models require a lengthy development cycle of data collection, analysis, and testing. Here, I present three new technologies to reduce the cycle time of model development for biophysical processes. The first, described in Chapters 2 and 3, is a novel assay, named BET-seq, for the rapid in vitro characterization of transcription factor-DNA interactions. We use this assay to quantify the binding site context specificities of two yeast transcription factors, Pho4 and Cbf1, to a greater resolution than previously possible. We find that the BET-seq data are sufficient to determine not only raw sequence preferences, but also higher-order epistatic interactions within the transcription factor-DNA complex, allowing for better models of these interactions. Chapter 3 provides a detailed protocol for running the BET-seq assay along with guidelines for assessing data quality and modeling the underlying biophysical process. Chapter 4 introduces an algorithm, named DeCoDe, to design protein-coding DNA libraries to rapidly test pools of hypothetically active protein constructs. DeCoDe uses integer linear programming to select optimal degenerate codons in pooled protein-coding DNA libraries. We show that DeCoDe significantly outperforms existing library design tools and, when used appropriately, can generate cost-effective libraries capable of screening functional protein sequence space. Finally, Chapter 5 presents preliminary efforts to model protein folding as a link-prediction problem using graph neural networks. The goal of this work is to reduce the time necessary to model the structural implications of changes to protein sequences. Taken together, these chapters provide a powerful new set of tools to rapidly design, build, and test systems of biophysical interactions

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2020; ©2020
Publication date	2020; 2020
Issuance	monographic
Language	English

Creators/Contributors

Author	Shimko, Tyler Carter
Degree supervisor	Fordyce, Polly
Thesis advisor	Fordyce, Polly
Thesis advisor	Altman, Russ
Thesis advisor	Kundaje, Anshul, 1980-
Thesis advisor	Sherlock, Gavin
Degree committee member	Altman, Russ
Degree committee member	Kundaje, Anshul, 1980-
Degree committee member	Sherlock, Gavin
Associated with	Stanford University, Department of Genetics.

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Tyler Carter Shimko
Note	Submitted to the Department of Genetics
Thesis	Thesis Ph.D. Stanford University 2020
Location	electronic resource

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...