Sampling-based exploration of the folded state of a protein under kinematic and geometric constraints

Placeholder Show Content

Abstract/Contents

Abstract
The conformational selection model is emerging as one of the best options to model protein flexibility in ligand binding. According to this model, a protein spans an ensemble of rapidly inter-converting folded conformations and a ligand selects the most favorable conformations to bind to from this ensemble. However, only a small number of folded conformations can be obtained experimentally (e.g., from X-ray crystallography, NMR spectrometry, or cryo-electron microscopy), while existing computational techniques (e.g., Molecular Dynamics and Monte-Carlo simulation) are often too expensive to sample conformations broadly distributed across the folded state of a protein. This dissertation focuses on designing new computational methods to efficiently explore the folded state of a protein using kino-geometric sampling approach, i.e., an approach that only considers kinematic constraints (fixed bond lengths and angles) and geometric constraints (no collision of atoms modeled as hard spheres). This approach heavily relies on the fact that in a folded state protein atoms are densely packed. The kino-geometric approach is first applied to loop sampling, where the goal is to explore the conformation space of a loop between two secondary structure elements (helices and/or strands) assuming that the conformation of the rest of the protein is fixed and given as input. Next, it is applied to the considerably more complex problem of sampling folded conformations of an entire protein. In both cases, the proposed methods are validated by showing that the kino-geometric sampler is able to produce a relatively small distribution of conformations (of a loop in the first case, and of an entire protein in the second case) that contains one or more conformations at small Root Means Square Deviation (RMSD) from a "target" conformation. This target conformation is not given to the kino-geometric sampler; therefore, the sampling process is not biased toward this conformation. The main computational difficulties in sampling folded protein conformations result from the large dimensionality of the conformation space and the very small relative volume of the folded subspace. While the small volume of the folded subspace is a necessary precondition for any sampling-based exploration method to work (otherwise a prohibitive number of samples would be required to adequately represent the subspace), it also makes it hard to sample conformations that fall into that subspace. The problem is analogous to looking for a pin in a haystack. For loop conformation sampling, a kino-geometric sampling method based on constraint prioritization is proposed. The idea is to break the loop into several pieces and, for each piece, to achieve the most restrictive constraints (geometric or kinematic) first. The method was tested on loops varying in length from 5 to 25 residues. Its combination with a pre-existing functional site prediction software (FEATURE) makes it possible to compute and recognize calcium-binding loop conformations. To sample folded conformations of an entire protein, this dissertation proposes a set of algorithms integrated into a sampler called KGS. Inspired by two previous kino-geometric samplers (ROCK and FRODA), KGS explores the folded state of a protein by expanding a distribution of conformations from an input folded conformation (usually extracted from the Protein Data Bank). Like ROCK and FRODA, it avoids unfolding the protein by selecting stable hydrogen bonds (H-bonds) and integrates them as additional kinematic constraints. But H-bonds result in a protein kinematic model containing many (often several dozens) interdependent closed kinematic cycles. These cycles considerably complicate sampling operations, as the rotatable dihedral angles can no longer be perturbed independently. The contributions of KGS are threefold: 1. It uses a Jacobian-based method to simultaneously deform many cycles of a protein kinematic structure without breaking them. This method is faster than that of ROCK, and allows deformation steps of greater amplitude than FRODA. 2. KGS embeds a new non-biased diffusive strategy that expands quickly away from the input conformation and progressively samples conformations more and more densely distributed over the folded state. 3. To predict H-bond stability, KGS uses a protein-independent model where energetic contribution is only one predictor among others. This model, trained on molecular dynamics data, has been shown to be 20% more accurate than models based on energy alone. Experiments show that KGS can sample functional (binding) conformations of a protein, given a non-functional (non-binding) one, even when the RMSD between the two conformations is large. This work indirectly demonstrates that kinematic and geometric constraints provide a good characterization of the folded state of a protein, despite the fact that they only implicitly and partially encode electrostatic and van der Waals (vdW) energy terms. On the other hand, it is much faster to handle these kino-geometric constraints computationally than energy functions, which are made of many terms.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Copyright date 2011
Publication date 2010, c2011; 2010
Issuance monographic
Language English

Creators/Contributors

Associated with Yao, Zhen (Peggy)
Associated with Stanford University, Department of Biomedical Informatics.
Primary advisor Latombe, Jean-Claude
Thesis advisor Latombe, Jean-Claude
Thesis advisor Altman, Russ
Thesis advisor Levitt, Michael, 1947-
Advisor Altman, Russ
Advisor Levitt, Michael, 1947-

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Zhen (Peggy) Yao.
Note Submitted to the Department of Biomedical Informatics.
Thesis Ph.D. Stanford University 2011
Location electronic resource

Access conditions

Copyright
© 2011 by Zhen Yao
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...