Theoretical models for practical problems : dynamic data structures, hierarchical clustering, and modern parallel computing

Placeholder Show Content

Abstract/Contents

Abstract
In theoretical computer science, we take practical problems, design abstractions to model them, and then prove what efficient algorithms can or cannot compute in these models. Naturally, our choice of abstraction can have ramifications on what we can formally prove and how applicable our theorems are to the original problems we had in mind. Ideally, our model attains both of these goals simultaneously. This is perhaps best illustrated by its quintessential success story: the Turing machine, a model of computation known for its universality and usefulness in proofs. In this dissertation, we examine three instances where choice of model features prominently. Dynamic Data Structures. Lower bounds for data structures typically proceed by reduction to communication complexity, where we first convert a hypothetical efficient data structure into an efficient protocol for a communication game, and then rule out the possibility of the latter. We present a novel online communication model and then use it to prove robust data structure lower bounds. In particular, we show that a hypothetical efficient data structure with asymptotically faster time per operation than classical data structures for these problems is so inaccurate that it might as well be ignoring updates and answering queries randomly. Hierarchical Clustering. We study several popular algorithms that people use to perform hierarchical clustering. We first consider a previously proposed objective function for hierarchical clustering, but we demonstrate that in the worst-case, these algorithms are poor approximations to this objective. We then propose a related objective function, and show that one method, average linkage agglomerative clustering, is a constant-approximation to our objective, while the other algorithms remain poor approximations. Modern Parallel Computation. We consider the question of how efficiently algorithms implemented on parallel computation platforms such as MapReduce and Hadoop can solve central problems in motivating application domains. To do so, we introduce a new abstract model for massively parallel computation, where the key limitation is the input size of each machine and the fundamental metric is the number of synchronous rounds needed to perform a particular computation. We show how efficient computations in our model can be represented as low-degree polynomials over the reals, allowing us to translate lower bounds from Boolean function analysis into parallel computation bounds. We also show that our lower bounds are the best that one could hope for; asymptotically better lower bounds would separate P from NC1.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2018; ©2018
Publication date 2018; 2018
Issuance monographic
Language English

Creators/Contributors

Author Wang, Joshua R
Degree supervisor Roughgarden, Tim
Thesis advisor Roughgarden, Tim
Thesis advisor Reingold, Omer
Thesis advisor Wootters, Mary
Degree committee member Reingold, Omer
Degree committee member Wootters, Mary
Associated with Stanford University, Computer Science Department.

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Joshua R. Wang.
Note Submitted to the Computer Science Department.
Thesis Thesis Ph.D. Stanford University 2018.
Location electronic resource

Access conditions

Copyright
© 2018 by Joshua Ruizhi Wang
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...