Theoretical models for practical problems : dynamic data structures, hierarchical clustering, and modern parallel computing

Wang, Joshua R

Theoretical models for practical problems : dynamic data structures, hierarchical clustering, and modern parallel computing

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fyf169bt8926" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: In theoretical computer science, we take practical problems, design abstractions to model them, and then prove what efficient algorithms can or cannot compute in these models. Naturally, our choice of abstraction can have ramifications on what we can formally prove and how applicable our theorems are to the original problems we had in mind. Ideally, our model attains both of these goals simultaneously. This is perhaps best illustrated by its quintessential success story: the Turing machine, a model of computation known for its universality and usefulness in proofs. In this dissertation, we examine three instances where choice of model features prominently. Dynamic Data Structures. Lower bounds for data structures typically proceed by reduction to communication complexity, where we first convert a hypothetical efficient data structure into an efficient protocol for a communication game, and then rule out the possibility of the latter. We present a novel online communication model and then use it to prove robust data structure lower bounds. In particular, we show that a hypothetical efficient data structure with asymptotically faster time per operation than classical data structures for these problems is so inaccurate that it might as well be ignoring updates and answering queries randomly. Hierarchical Clustering. We study several popular algorithms that people use to perform hierarchical clustering. We first consider a previously proposed objective function for hierarchical clustering, but we demonstrate that in the worst-case, these algorithms are poor approximations to this objective. We then propose a related objective function, and show that one method, average linkage agglomerative clustering, is a constant-approximation to our objective, while the other algorithms remain poor approximations. Modern Parallel Computation. We consider the question of how efficiently algorithms implemented on parallel computation platforms such as MapReduce and Hadoop can solve central problems in motivating application domains. To do so, we introduce a new abstract model for massively parallel computation, where the key limitation is the input size of each machine and the fundamental metric is the number of synchronous rounds needed to perform a particular computation. We show how efficient computations in our model can be represented as low-degree polynomials over the reals, allowing us to translate lower bounds from Boolean function analysis into parallel computation bounds. We also show that our lower bounds are the best that one could hope for; asymptotically better lower bounds would separate P from NC1.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2018; ©2018
Publication date	2018; 2018
Issuance	monographic
Language	English

Creators/Contributors

Author	Wang, Joshua R
Degree supervisor	Roughgarden, Tim
Thesis advisor	Roughgarden, Tim
Thesis advisor	Reingold, Omer
Thesis advisor	Wootters, Mary
Degree committee member	Reingold, Omer
Degree committee member	Wootters, Mary
Associated with	Stanford University, Computer Science Department.

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Joshua R. Wang.
Note	Submitted to the Computer Science Department.
Thesis	Thesis Ph.D. Stanford University 2018.
Location	electronic resource

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...