Algorithms for accurate and sensitive interpretation of mass spectra against arbitrarily large peptide search spaces

Placeholder Show Content

Abstract/Contents

Abstract
Tandem mass spectrometry (MS/MS) enables the high-throughput identification and characterization of complex protein mixtures, and depends critically on bioinformatics tools to interpret mass spectra as peptide sequences. There exist two general techniques for the interpretation of mass spectra: de novo sequencing and database search. In de novo sequencing, a mass spectrum is directly interpreted as a protein sequence. In database search, a mass spectrum is identified from its best match in an existing sequence or spectrum database. Though more unbiased and less restrictive than database search algorithms, de novo sequencing algorithms are less popular due to their relatively lower accuracy and lack of automated statistical validation tools. However, database search algorithms suffer greatly in both speed and sensitivity as database search spaces increase through the addition of protein sequences and post-translational modifications. To able to apply MS/MS to more diverse systems, I developed the de novo sequencing algorithm Label Assisted De novo Sequencing (LADS). LADS utilizes chemical strategies to bolster introduce signatures into mass spectra which improve sequencing accuracy, and employs a support vector machine-based model to discriminate true from false identifications. I also developed a method by which to empirically estimate false discovery rates (FDRs) from any de novo sequencing algorithm. In the last stage of my PhD, I developed TagGraph, an unrestricted database search tool able to match peptides to mass spectra from sequence databases without assuming any protease specificity or requiring a user-specified set of modifications. I demonstrate the utility of TagGraph on the recently published human proteome dataset, matching over four million spectra to modified peptides, and identifying new functional roles and disease associations for protein hydroxylation. Both TagGraph and the de novo FDR calibration technology described herein have the potential to greatly extend the scope and depth of tandem MS analyses.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2016
Issuance monographic
Language English

Creators/Contributors

Associated with Devabhaktuni, Arun
Associated with Stanford University, Department of Chemical and Systems Biology.
Primary advisor Elias, Joshua
Thesis advisor Elias, Joshua
Thesis advisor Dill, David L
Thesis advisor Mallick, Parag, 1976-
Thesis advisor Meyer, Tobias
Advisor Dill, David L
Advisor Mallick, Parag, 1976-
Advisor Meyer, Tobias

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Arun Devabhaktuni.
Note Submitted to the Department of Chemical and Systems Biology.
Thesis Thesis (Ph.D.)--Stanford University, 2016.
Location electronic resource

Access conditions

Copyright
© 2016 by Arun Devabhaktuni
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...