Quantitative analysis of mammalian translation initiation
- A fundamental goal of biology is to predict the protein expression from a genomic sequence. Any such quantitative proteomic prediction must account for the process of translation initiation, which defines the translational reading frame and directly affects the protein synthesis rate. In the eukaryotic scanning ribosome model, the small ribosomal subunit traverses the mRNA starting from the 5'cap in search of a start codon, which is typically an AUG. The likelihood of a ribosome recognizing and initiating at any given start codon is defined as the translation initiation site (TIS) efficiency. It is known that the mRNA sequence surrounding the start codon interacts with the scanning ribosome and impacts the TIS efficiency. The goal of this work is to quantitatively predict the efficiency of initiation for every possible translation initiation site that contains an AUG start codon. In order to analyze the large number of possible TIS sequences, we could not rely on traditional laboratory methods, but instead had to develop a new technique called FACS-seq, which combines high-throughput cell sorting and next-generation DNA sequence. We applied the FACS-seq method to a genetic fluorescence reporter representing all 65,536 possible TIS sequences spanning the 6 bases upstream and 2 bases downstream of the start codon. From our FACS-seq data, we found the TIS motif RYMRMVAUGGC to have the highest translation efficiency, where R = A or G, Y = C or U, M = A or C, and V = A, C, or G. However, by fitting a dinucleotide position weight matrix to the TIS efficiency data, it was shown that dinucleotide interactions, which cannot be conveyed in a single TIS motif, significantly impact the initiation efficiency. Here, FACS-seq was applied to the study of translation initiation, but in principle, the method could be applied to any genetic library that uses a fluorescence reporter. The FACS-seq dataset, combined with modeling, enabled the prediction of translation initiation efficiency for any mRNA transcripts based solely on the sequence. We first investigated how mutations near an annotated start codon altered gene expression, thereby causing disease. A collection of somatic TIS mutations found in tumor samples were screened to identify mutations that altered gene expression in a manner consistent with known tumor expression patterns. Therefore, the identified TIS mutations potentially drove the tumor formation by altering the protein synthesis rate. Next, we considered how leaky scanning past low efficiency TISs allows for initiation at down-stream alternative sites. Similar to transcriptional isoforms, the translational isoforms resulting from alternative initiation sites expands the proteomic diversity with important biological consequences. A quantitative leaky scanning model was used to predict mRNA transcripts with in-frame alternative initiation sites, which would generate truncated protein isoforms. These predictions were supported experimentally by ribosome footprint profiling data. In conclusion, the extensive analysis of the TIS sequence-space using FACS-seq has improved our ability to quantitatively predict the efficiency and location of translation initiation.
|Type of resource
|electronic; electronic resource; remote
|1 online resource.
|Noderer, William Lewis
|Stanford University, Department of Chemical Engineering.
|Wang, Clifford (Clifford Lee)
|Wang, Clifford (Clifford Lee)
|Statement of responsibility
|William Lewis Noderer.
|Submitted to the Department of Chemical Engineering.
|Thesis (Ph.D.)--Stanford University, 2014.
- © 2014 by William Lewis Noderer
Also listed in
Loading usage metrics...