Statistical learning for large-scale survival data

Li, Ruilin

Statistical learning for large-scale survival data

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Ffr646ms1849" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: The constantly growing population biobanks have provided scientists and researchers unprecedented opportunities to understand human diseases genetics. Survival analysis gives insights on the association between the predictors and time-to-event responses and is particularly suitable for such data. On the other hand, millions of genetic variants sequenced from hundreds of thousands of individuals also pose computational challenges. Chapter 1 and Chapter 3 of this dissertation present three methods that reduce the memory requirement and improve the computational speed in analyzing such data. The first method is a variable screening procedure that exploits the sparsity structure on the association between the predictors and the response in high-dimensional datasets, which reduces the frequency of expensive I/O operations for larger-than-RAM data. The second method utilizes a 2-bits-per-entry compact representation specifically for genetic matrices, which further reduces memory requirement and makes our bandwidth bound optimization algorithm scalable to more CPU cores. The third method combines the compact representation for genetic variants and a simplified version of the compressed sparse block format to represent genetic data with a large number of rare variants. The prediction performance of survival models suffers when the number of censored survival time is large. This could happen If we define the survival time as the age of onset of a rare disease. In Chapter 2, I will provide a group-sparse regression-based algorithm to boost the prediction performance on such data. This method is applicable when there are other survival responses with a large number of observed events and are associated with the same predictors as the rare event response. Finally, Chapter 4 provides a baseline-adjusted concordance index as a stable evaluation metric of survival models. This metric is particularly useful in evaluating stratified Cox models, as well as in model selection using cross validation.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2021; ©2021
Publication date	2021; 2021
Issuance	monographic
Language	English

Creators/Contributors

Author	Li, Ruilin
Degree supervisor	Rivas, Manuel
Degree supervisor	Tibshirani, Robert
Thesis advisor	Rivas, Manuel
Thesis advisor	Tibshirani, Robert
Thesis advisor	Taylor, Jonathan E
Degree committee member	Taylor, Jonathan E
Associated with	Stanford University, Institute for Computational and Mathematical Engineering

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Ruilin Li.
Note	Submitted to the Institute for Computational and Mathematical Engineering.
Thesis	Thesis Ph.D. Stanford University 2021.
Location	https://purl.stanford.edu/fr646ms1849

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...