Productivity and performance with embedded domain-specific languages

Placeholder Show Content

Abstract/Contents

Abstract
Modern computing is transitioning from a predominantly sequential processing model to heterogeneous platforms that combine sequential, parallel, specialized, and distributed processors. However, mainstream tools and programming languages have not kept up with these changes. In particular, the generality of high-level programming language abstractions makes it difficult to perform sophisticated parallel optimization or generate code efficiently for different devices. Instead, programmers who require high performance must often abandon high-level abstractions and program using low-level models (e.g. CUDA, OpenCL) or explicitly parallel models and languages (e.g. MapReduce, X10). Implementing a high performance solution is time-consuming, error-prone, and usually not portable to different devices or datasets. This additional programming complexity means that high performance solutions are out of reach for most non-experts. We believe that the specialization of programming languages into domain-specific languages (DSLs) enables new opportunities for productive high performance programming on modern systems. We demonstrate that DSL compilers can exploit high-level data and control structure and domain-specific optimizations to efficiently compile implicitly parallel DSLs to multicore CPUs, GPUs and clusters. By sacrificing generality, we achieve both productivity (DSL users use high-level abstractions to write their program once) and performance (the programs are compiled to run on heterogeneous systems). We use the Lightweight Modular Staging (LMS) and Delite frameworks as a starting point. LMS and Delite are Scala frameworks that provide common, reusable components that simplify the implementation of embedded DSL compilers. We develop OptiML, an implicitly parallel DSL for machine learning, and describe its design and implementation. We show that OptiML is both productive and performant (it outperforms MATLAB and is competitive with hand-optimized C++ in nearly all cases). We present new techniques for composing compiled embedded DSLs, and validate these techniques by implementing and composing new compiled embedded DSLs for data querying (OptiQL), graph analysis (OptiGraph), scientific computing (OptiMesh) and collections (OptiCollections). The ability to compose compiled DSLs makes it possible to use them like libraries while still achieving high performance. Furthermore, we present a case study on the degree of reuse achieved across the DSL implementations and show that common compiler infrastructure has a real-world impact in reducing the effort required to build a DSL. Finally, we introduce Forge, a new meta DSL for DSL development that provides a high-level specification language for parallel and heterogeneous DSLs. This body of work demonstrates for the first time that embedded DSLs can approximate both the productivity of libraries and the performance of hand-optimized code.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2014
Issuance monographic
Language English

Creators/Contributors

Associated with Sujeeth, Arvind Krishna
Associated with Stanford University, Department of Electrical Engineering.
Primary advisor Olukotun, Oyekunle Ayinde
Thesis advisor Olukotun, Oyekunle Ayinde
Thesis advisor Kozyrakis, Christoforos, 1974-
Thesis advisor Ré, Christopher
Advisor Kozyrakis, Christoforos, 1974-
Advisor Ré, Christopher

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Arvind Krishna Sujeeth.
Note Submitted to the Department of Electrical Engineering.
Thesis Thesis (Ph.D.)--Stanford University, 2014.
Location electronic resource

Access conditions

Copyright
© 2014 by Arvind Krishna Sujeeth
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...