Theoretical models for practical problems : dynamic data structures, hierarchical clustering, and modern parallel computing
- In theoretical computer science, we take practical problems, design abstractions to model them, and then prove what efficient algorithms can or cannot compute in these models. Naturally, our choice of abstraction can have ramifications on what we can formally prove and how applicable our theorems are to the original problems we had in mind. Ideally, our model attains both of these goals simultaneously. This is perhaps best illustrated by its quintessential success story: the Turing machine, a model of computation known for its universality and usefulness in proofs. In this dissertation, we examine three instances where choice of model features prominently. Dynamic Data Structures. Lower bounds for data structures typically proceed by reduction to communication complexity, where we first convert a hypothetical efficient data structure into an efficient protocol for a communication game, and then rule out the possibility of the latter. We present a novel online communication model and then use it to prove robust data structure lower bounds. In particular, we show that a hypothetical efficient data structure with asymptotically faster time per operation than classical data structures for these problems is so inaccurate that it might as well be ignoring updates and answering queries randomly. Hierarchical Clustering. We study several popular algorithms that people use to perform hierarchical clustering. We first consider a previously proposed objective function for hierarchical clustering, but we demonstrate that in the worst-case, these algorithms are poor approximations to this objective. We then propose a related objective function, and show that one method, average linkage agglomerative clustering, is a constant-approximation to our objective, while the other algorithms remain poor approximations. Modern Parallel Computation. We consider the question of how efficiently algorithms implemented on parallel computation platforms such as MapReduce and Hadoop can solve central problems in motivating application domains. To do so, we introduce a new abstract model for massively parallel computation, where the key limitation is the input size of each machine and the fundamental metric is the number of synchronous rounds needed to perform a particular computation. We show how efficient computations in our model can be represented as low-degree polynomials over the reals, allowing us to translate lower bounds from Boolean function analysis into parallel computation bounds. We also show that our lower bounds are the best that one could hope for; asymptotically better lower bounds would separate P from NC1.
|Type of resource
|electronic resource; remote; computer; online resource
|1 online resource.
|Wang, Joshua R
|Degree committee member
|Degree committee member
|Stanford University, Computer Science Department.
|Statement of responsibility
|Joshua R. Wang.
|Submitted to the Computer Science Department.
|Thesis Ph.D. Stanford University 2018.
- © 2018 by Joshua Ruizhi Wang
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...