Predicting protein function and protein-ligand interactions through machine learning
Abstract/Contents
- Abstract
- With modern day advancements in high throughput technology, we have more genomes, sequences, and protein structures available. An important scientific endeavor is to apply this information towards combating human diseases and disorders. Two key steps in this task involve understanding the function of proteins and developing the means to modulate their behavior. Experimental assays do not possess the necessary throughput to characterize in full the function and drug-binding preferences of these many newly identified proteins. Computational assessment is an attractive alternative, but current algorithms possess many shortcomings. Function prediction tools struggle to annotate sequence and structurally unique proteins; ligand-binding predictors have limited accuracy, as they are largely physics-based with many approximations built into the calculation of intermolecular interactions. With the wealth of biological information, specifically protein structure data, there presents an opportunity to take data-driven, machine learning approaches to these scientific questions. This dissertation thus presents novel computational algorithms for predicting protein function and small molecule interactions that merge protein structure data with machine learning. The first method (HMMDF) combines protein sequence models (Hidden Markov Models) with protein structure models augmented by structural dynamics information (Dynamic FEATURE) to identify the function of sequence and structurally novel proteins. HMMDF applied to thioredoxin function prediction shows high precision and recall. The second method (FragFEATURE) addresses the prediction of protein-ligand interactions using an innovative knowledge base of protein structural environments annotated with the small molecule substructures (fragments) they bind. Given a protein structure of interest, FragFEATURE searches the knowledge base for environments similar to the query to identify statistically enriched fragments. FragFEATURE predicts fragments corresponding to known ligands of a protein target with high accuracy; in many cases, FragFEATURE predicts fragments corresponding to known inhibitors of a protein target. Using this fragment binding predictor, I identified fragments for two bacterial proteins involved in pathogenesis and antibiotic resistance. These fragments may lead us to the development of inhibitors for these therapeutically important protein targets. In summary, the work presented in this dissertation represents novel and powerful methods for interrogating protein function and protein-ligand interactions, strengthening the repertoire of computational tools to assist in the understanding and treatment of human diseases and disorders.
Description
Type of resource | text |
---|---|
Form | electronic; electronic resource; remote |
Extent | 1 online resource. |
Publication date | 2014 |
Issuance | monographic |
Language | English |
Creators/Contributors
Associated with | Tang, Grace Wonlyn |
---|---|
Associated with | Stanford University, Department of Bioengineering |
Primary advisor | Altman, Russ |
Thesis advisor | Altman, Russ |
Thesis advisor | Huang, Kerwyn Casey, 1979- |
Thesis advisor | Pande, Vijay |
Advisor | Huang, Kerwyn Casey, 1979- |
Advisor | Pande, Vijay |
Subjects
Genre | Theses |
---|
Bibliographic information
Statement of responsibility | Grace Wonlyn Tang. |
---|---|
Note | Submitted to the Department of Bioengineering. |
Thesis | Thesis (Ph.D.)--Stanford University, 2014. |
Location | electronic resource |
Access conditions
- Copyright
- © 2014 by Grace Wonlyn Tang
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...