Algorithms for analysis of multiple biological sequences : theory and practice

Davydov, Eugene V; Stanford University, Computer Science Department.

Algorithms for analysis of multiple biological sequences : theory and practice

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Ffk368by4307" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Availability of massive amounts of genomic data from hundreds of species has introduced many challenging computational problems as well as the need for efficient algorithmic tools that leverage multiple species information to facilitate biological analysis. This dissertation discusses two such problems: noncoding RNA multiple structural alignment and constrained element detection. Noncoding RNA genes (ncRNAs) are regions of the genome that are transcribed but not translated into protein, and fold directly into secondary and tertiary structures which can have a variety of important biological functions. Because their function depends closely on the secondary structure, ncRNAs often do not exhibit enough primary sequence conservation to be properly aligned using standard sequence-based methods. I therefore consider the problem of RNA multiple structural alignment, i.e., performing sequence alignment and secondary structure prediction simultaneously. In the first part of this dissertation I introduce a novel graph theoretic framework for analyzing this problem and prove that when the number of sequences is not fixed it is NP-complete. I also provide a polynomial time algorithm that approximates the optimal solution to within a factor of O(log^2 n). Constrained elements are regions of the human genome exhibiting evidence of purifying selection and therefore biological function. Computational identification of such elements is one of the major goals of comparative genomics. In the second part of this dissertation I present GERP++, a new tool for efficient constrained element detection that significantly improves on one of the current leading methods, GERP. While retaining GERP's biological transparency and metric for quantifying position-specific constraint, GERP++ uses a more rigorous method for computing evolutionary rates and a novel algorithm for element identification that uses statistical significance directly to evaluate and rank candidate elements. These algorithmic improvements decrease the running time by several orders of magnitude in practice, enabling high-throughput analysis of large data sets. Furthermore, I present analysis and biological interpretation of constrained elements identified by GERP++ in the human genome from recently available multiple species alignments.

Description

Type of resource	text
Form	electronic; electronic resource; remote
Extent	1 online resource.
Copyright date	2010
Publication date	2009, c2010; 2009
Issuance	monographic
Language	English

Creators/Contributors

Associated with	Davydov, Eugene V
Associated with	Stanford University, Computer Science Department.
Primary advisor	Batzoglou, Serafim
Thesis advisor	Batzoglou, Serafim
Thesis advisor	Dill, David L
Thesis advisor	Sidow, Arend
Advisor	Dill, David L
Advisor	Sidow, Arend

Subjects

Genre	Theses

Bibliographic information

Statement of responsibility	Eugene V. Davydov.
Note	Submitted to the Department of Computer Science.
Thesis	Thesis (Ph.D.)--Stanford University, 2010.
Location	electronic resource

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...