Fast kernel evaluation in high dimensions : importance sampling and near neighbor search
Abstract/Contents
- Abstract
- High-dimensional datasets arise in a multitude of ways: naturally, due to the need to represent complex objects (images, audio), and synthetically, due to the need to represent complex relationships between objects through vector embeddings (e.g. GloVe, ImageNet). At the same time, these datasets are of massive scale on the order of million or billion points and even executing simple computational tasks, like evaluating a predictive model, for all points in the dataset can be time-consuming. These considerations along with the fact that such computational primitives are building blocks for applications that are becoming ubiquitous in everyday life raise the need for efficient and reliable algorithms. We study the problem of fast kernel evaluation that underlies kernel-based predictive models. Given a kernel function k(x, y), that takes as input two vectors and outputs a number in [0,1], we ask the question of how fast we can output an approximation to the average value of the kernel function (kernel density) between a query vector and all points in a dataset. This problem has a long history of study in Scientific Computing but before our work under worst case assumptions all methods either run in time exponential in the dimension (curse of dimensionality) or linear in the number of points in the dataset. In this thesis, we lift the curse of dimensionality from the problem of Kernel Density Evaluation for a wide range of kernels through a variety of new techniques at the core of which lie Randomized Space Partitions. Using randomized space partitions in an increasingly sophisticated manner, we obtain the first data structures that under worst-case assumptions provably approximate the kernel density at arbitrary query points in time polynomial in the dimension and sub-linear in the number of points. In the process, we draw ideas from Approximation Theory, Dimensionality Reduction, and Near Neighbor Search
Description
Type of resource | text |
---|---|
Form | electronic resource; remote; computer; online resource |
Extent | 1 online resource |
Place | California |
Place | [Stanford, California] |
Publisher | [Stanford University] |
Copyright date | 2019; ©2019 |
Publication date | 2019; 2019 |
Issuance | monographic |
Language | English |
Creators/Contributors
Author | Syminelakis, Paraskevas |
---|---|
Degree supervisor | Charikar, Moses |
Thesis advisor | Charikar, Moses |
Thesis advisor | Lall, Sanjay |
Thesis advisor | Wootters, Mary |
Degree committee member | Lall, Sanjay |
Degree committee member | Wootters, Mary |
Associated with | Stanford University, Department of Electrical Engineering. |
Subjects
Genre | Theses |
---|---|
Genre | Text |
Bibliographic information
Statement of responsibility | Paraskevas Syminelakis |
---|---|
Note | Submitted to the Department of Electrical Engineering |
Thesis | Thesis Ph.D. Stanford University 2019 |
Location | electronic resource |
Access conditions
- Copyright
- © 2019 by Paraskevas Syminelakis
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...