A model-free approach to high-dimensional inference

Placeholder Show Content

Abstract/Contents

Abstract
Consider trying to understand the genetic basis of a disease. A natural first step would be to sequence the genomes of a large group of people and note whether each person has the disease or not. With such a data set one might hope to answer questions such as which mutations make the disease more likely, or how much all the mutations together explain disease contraction as opposed to environmental factors. Unfortunately, due to the large number (hundreds of thousands or more) of locations on the genome with potential mutations, classical statistical techniques cannot be used to answer such questions. Similar problems with a response variable of interest and many potential explanatory variables (known as 'high-dimensional' problems) abound in modern statistical applications, including in medicine, political science, advertising, and many more. The driving force behind the recent surge in such high-dimensional problems is that it has become easier and less expensive to collect, store, and process increasing amounts of information about individuals such as entire genomes, medical records, or online behavior. The limitations of classical methods in high-dimensional settings demand innovation in the statistical field of high-dimensional inference. In addition to requiring creative mathematical insight, most high-dimensional inference problems are non-starters without some further assumptions about the underlying process generating the data. As such, a constant challenge and source of debate regards the best way to make assumptions that are realistic, verifiable, and allow for fast and powerful methods. This thesis contributes to the discussion by surveying existing methods along with their assumptions, proposing a different perspective on how assumptions are made, and highlighting the benefits of that perspective by detailing two novel methods (developed jointly by the author and his collaborators) for high-dimensional inference that embody it.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2017
Issuance monographic
Language English

Creators/Contributors

Associated with Janson, Lucas Beck
Associated with Stanford University, Department of Statistics.
Primary advisor Candès, Emmanuel J. (Emmanuel Jean)
Thesis advisor Candès, Emmanuel J. (Emmanuel Jean)
Thesis advisor Hastie, Trevor
Thesis advisor Mackey, Lester
Advisor Hastie, Trevor
Advisor Mackey, Lester

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Lucas Beck Janson.
Note Submitted to the Department of Statistics.
Thesis Thesis (Ph.D.)--Stanford University, 2017.
Location electronic resource

Access conditions

Copyright
© 2017 by Lucas Janson
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...