Predicting protein function and protein-ligand interactions through machine learning

Placeholder Show Content

Abstract/Contents

Abstract
With modern day advancements in high throughput technology, we have more genomes, sequences, and protein structures available. An important scientific endeavor is to apply this information towards combating human diseases and disorders. Two key steps in this task involve understanding the function of proteins and developing the means to modulate their behavior. Experimental assays do not possess the necessary throughput to characterize in full the function and drug-binding preferences of these many newly identified proteins. Computational assessment is an attractive alternative, but current algorithms possess many shortcomings. Function prediction tools struggle to annotate sequence and structurally unique proteins; ligand-binding predictors have limited accuracy, as they are largely physics-based with many approximations built into the calculation of intermolecular interactions. With the wealth of biological information, specifically protein structure data, there presents an opportunity to take data-driven, machine learning approaches to these scientific questions. This dissertation thus presents novel computational algorithms for predicting protein function and small molecule interactions that merge protein structure data with machine learning. The first method (HMMDF) combines protein sequence models (Hidden Markov Models) with protein structure models augmented by structural dynamics information (Dynamic FEATURE) to identify the function of sequence and structurally novel proteins. HMMDF applied to thioredoxin function prediction shows high precision and recall. The second method (FragFEATURE) addresses the prediction of protein-ligand interactions using an innovative knowledge base of protein structural environments annotated with the small molecule substructures (fragments) they bind. Given a protein structure of interest, FragFEATURE searches the knowledge base for environments similar to the query to identify statistically enriched fragments. FragFEATURE predicts fragments corresponding to known ligands of a protein target with high accuracy; in many cases, FragFEATURE predicts fragments corresponding to known inhibitors of a protein target. Using this fragment binding predictor, I identified fragments for two bacterial proteins involved in pathogenesis and antibiotic resistance. These fragments may lead us to the development of inhibitors for these therapeutically important protein targets. In summary, the work presented in this dissertation represents novel and powerful methods for interrogating protein function and protein-ligand interactions, strengthening the repertoire of computational tools to assist in the understanding and treatment of human diseases and disorders.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2014
Issuance monographic
Language English

Creators/Contributors

Associated with Tang, Grace Wonlyn
Associated with Stanford University, Department of Bioengineering
Primary advisor Altman, Russ
Thesis advisor Altman, Russ
Thesis advisor Huang, Kerwyn Casey, 1979-
Thesis advisor Pande, Vijay
Advisor Huang, Kerwyn Casey, 1979-
Advisor Pande, Vijay

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Grace Wonlyn Tang.
Note Submitted to the Department of Bioengineering.
Thesis Thesis (Ph.D.)--Stanford University, 2014.
Location electronic resource

Access conditions

Copyright
© 2014 by Grace Wonlyn Tang
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...