Fast kernel evaluation in high dimensions : importance sampling and near neighbor search

Placeholder Show Content

Abstract/Contents

Abstract
High-dimensional datasets arise in a multitude of ways: naturally, due to the need to represent complex objects (images, audio), and synthetically, due to the need to represent complex relationships between objects through vector embeddings (e.g. GloVe, ImageNet). At the same time, these datasets are of massive scale on the order of million or billion points and even executing simple computational tasks, like evaluating a predictive model, for all points in the dataset can be time-consuming. These considerations along with the fact that such computational primitives are building blocks for applications that are becoming ubiquitous in everyday life raise the need for efficient and reliable algorithms. We study the problem of fast kernel evaluation that underlies kernel-based predictive models. Given a kernel function k(x, y), that takes as input two vectors and outputs a number in [0,1], we ask the question of how fast we can output an approximation to the average value of the kernel function (kernel density) between a query vector and all points in a dataset. This problem has a long history of study in Scientific Computing but before our work under worst case assumptions all methods either run in time exponential in the dimension (curse of dimensionality) or linear in the number of points in the dataset. In this thesis, we lift the curse of dimensionality from the problem of Kernel Density Evaluation for a wide range of kernels through a variety of new techniques at the core of which lie Randomized Space Partitions. Using randomized space partitions in an increasingly sophisticated manner, we obtain the first data structures that under worst-case assumptions provably approximate the kernel density at arbitrary query points in time polynomial in the dimension and sub-linear in the number of points. In the process, we draw ideas from Approximation Theory, Dimensionality Reduction, and Near Neighbor Search

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2019; ©2019
Publication date 2019; 2019
Issuance monographic
Language English

Creators/Contributors

Author Syminelakis, Paraskevas
Degree supervisor Charikar, Moses
Thesis advisor Charikar, Moses
Thesis advisor Lall, Sanjay
Thesis advisor Wootters, Mary
Degree committee member Lall, Sanjay
Degree committee member Wootters, Mary
Associated with Stanford University, Department of Electrical Engineering.

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Paraskevas Syminelakis
Note Submitted to the Department of Electrical Engineering
Thesis Thesis Ph.D. Stanford University 2019
Location electronic resource

Access conditions

Copyright
© 2019 by Paraskevas Syminelakis
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...