Interfaces for efficient software composition on modern hardware

Placeholder Show Content

Abstract/Contents

Abstract
For decades, developers have been productive writing software by composing optimized libraries and functions written by other developers. Though hardware trends have evolved significantly over this time---with the ending of Moore's law, the increasing ubiquity of parallelism, and the emergence of new accelerators---many of the common interfaces for composing software have nevertheless remained unchanged since their original design. This lack of evolution is causing serious performance consequences in modern applications. For example, the growing gap between memory and processing speeds means that applications that compose even hand-tuned libraries can spend more time transferring data through main memory between individual function calls than they do performing computations. This problem is even worse for applications that interface with new hardware accelerators such as GPUs. Though application writers can circumvent these bottlenecks manually, these optimizations come at the expense of programmability. In short, the interfaces for composing even optimized software modules are no longer sufficient to best use the resources of modern hardware. This dissertation proposes designing new interfaces for efficient software composition on modern hardware by leveraging algebraic properties intrinsic to software APIs to unlock new optimizations. We demonstrate this idea with three new composition interfaces. The first interface, Weld, uses a functional intermediate representation (IR) to capture the parallel structure of data analytics workloads underneath existing APIs, and enables powerful data movement optimizations over this IR to optimize applications end-to-end. The second, called split annotations (SAs), also focuses on data movement optimization and parallelization, but uses annotations on top of existing functions to define an algebra for specifying how data passed between functions can be partitioned and recombined to enable cross-function pipelining. The third, called raw filtering, optimizes data loading in data-intensive systems by redefining the interface between data parsers and query engines to improve CPU efficiency. Our implementations of these interfaces have shown substantial performance benefits in rethinking the interface between software modules. More importantly, they have also shown the limitations of existing established interfaces. Weld and SAs show that a new interface can accelerate data science pipelines by over 100x in some cases in multicore environments, by enabling data movement optimizations such as pipelining on top of existing libraries such as NumPy and Pandas. We also show that Weld can be used to target new parallel accelerators, such as vector processors and GPUs, and that SAs can enable these speedups even on black-box libraries without any library code modification. Finally, the I/O optimizations in raw filtering show over 9x improvements in end-to-end query execution time in distributed systems such as Spark SQL when processing semi-structured data such as JSON

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2020; ©2020
Publication date 2020; 2020
Issuance monographic
Language English

Creators/Contributors

Author Palkar, Shoumik Prasad
Degree supervisor Zaharia, Matei
Thesis advisor Zaharia, Matei
Thesis advisor Kozyrakis, Christoforos, 1974-
Thesis advisor Winstein, Keith
Degree committee member Kozyrakis, Christoforos, 1974-
Degree committee member Winstein, Keith
Associated with Stanford University, Computer Science Department.

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Shoumik Palkar
Note Submitted to the Computer Science Department
Thesis Thesis Ph.D. Stanford University 2020
Location electronic resource

Access conditions

Copyright
© 2020 by Shoumik Prasad Palkar
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...