High-level language compilers for heterogeneous accelerators

Lee, HyoukJoong; Stanford University, Department of Electrical Engineering.

High-level language compilers for heterogeneous accelerators

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fts328qd2423" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Power constraints and ever increasing performance requirements have led hardware vendors to enhance general-purpose processors with specialized hardware components, called accelerators. To expose specific features of accelerators, vendors often introduce a new programming model or extend an existing programming model. As a result, application developers are faced with an ever-increasing number of disparate programming models. In order to maximize performance on an accelerator, application developers are required to understand the hardware details, as well as the programming model, and manually apply optimizations. Even worse, applications optimized for one accelerator need to be rewritten to run efficiently on another accelerator. In order to improve programmer productivity on accelerators, recent work has explored using higher-level languages. High-level languages, such as domain-specific languages (DSLs), provide application developers with target-independent language constructs so they can focus on functionality rather than performance. Compilers are responsible for automatically generating highly optimized code for each target accelerator. Despite the promising benefits of using high-level languages, this approach has not yet been very successful at a large scale. Compilers often lack the necessary performance when compared to manually optimized implementations and also only target a specific accelerator rather than supporting multiple accelerators of different types. In this thesis, we present a principled approach for building a high-level language compiler that can target multiple accelerators from a carefully designed common representation. Our programming model is based on a set of parallel patterns, such as Map, and Reduce, which serves as a high-level, target-independent computation abstraction for application developers. Since each parallel pattern encodes semantics on parallelism, data access patterns, and synchronization, compilers can easily reason about the operation without complicated analyses. We first identify three major components of the compiler as computation, communication, and memory management. In order to achieve maximum performance on an accelerator all three components must be properly optimized. For each component, we define a set of high-level primitives that succinctly captures important semantics, which are later specialized for each accelerator by the compiler. With the defined high-level primitives, we develop optimizations that are both common across different accelerators as well as target-specific. We show the impact of optimizations on a set of applications in various domains. By sharing common optimizations across accelerators, adding support for a new type of accelerator becomes an incremental task. Next, we focus on two important classes of accelerators, GPUs and FPGAs, and develop compiler analyses and optimizations for the computation component. Parallel patterns are often nested in many applications, exposing parallelism at multiple levels. As GPUs have hierarchical computation units with certain locality between them, we present compiler techniques to automatically map nested patterns onto GPUs to maximize parallelism and locality. For FPGAs, we present a set of different optimizations required to maximize memory bandwidth and resource utilization on FPGAs. Exploiting the high-level semantics encoded in the parallel patterns enables the optimizations. Finally, as domain-specific languages (DSLs) have been drawing attention for high performance and productivity, we demonstrate how our compiler can be extended as a compiler infrastructure for developing DSLs. Since DSL operations can be implemented as a composition of the parallel patterns we support, DSL developers can focus on designing domain operations and mapping them to the patterns while relying on our compiler to automatically target and optimize for heterogeneous accelerators. We demonstrate this approach with OptiML, a machine learning DSL we implemented using our compiler infrastructure.

Description

Type of resource	text
Form	electronic; electronic resource; remote
Extent	1 online resource.
Publication date	2016
Issuance	monographic
Language	English

Creators/Contributors

Associated with	Lee, HyoukJoong
Associated with	Stanford University, Department of Electrical Engineering.
Primary advisor	Olukotun, Oyekunle Ayinde
Thesis advisor	Olukotun, Oyekunle Ayinde
Thesis advisor	Dally, William J
Thesis advisor	Ré, Christopher
Advisor	Dally, William J
Advisor	Ré, Christopher

Subjects

Genre	Theses

Bibliographic information

Statement of responsibility	HyoukJoong Lee.
Note	Submitted to the Department of Electrical Engineering.
Thesis	Thesis (Ph.D.)--Stanford University, 2016.
Location	electronic resource

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...