Realm : performance portability through composable asynchrony
- Modern supercomputers are growing increasingly complicated. The laws of physics have forced processor counts into the thousands or even millions, resulted in the creation of deep distributed memory hierarchies, and encouraged the use of multiple processor and memory types in the same system. Developing an application that is able to fully utilize such a system is very difficult. The development of an application that is able to run well on more than one such system with current programming models is so daunting that it is generally not even attempted. The Legion project attempts to address these challenges by combining a traditional hierarchical application structure (i.e. tasks/functions calling other tasks/functions) with a hierarchical data model (logical regions, which may be partitioned into subregions), and introducing the concept of mapping, a process in which the tasks and regions of a machine-agnostic description are assigned to the processors and memories of a particular machine. This dissertation focuses on Realm, the ``low-level'' runtime that manages the execution of a mapped Legion application. Realm is a fully asynchronous event-based runtime. Realm operations are deferred by the runtime, returning an event that triggers upon completion of the operation. These events may be used as preconditions for other operations, allowing arbitrary composition of asynchronous operations. The resulting operation graph naturally exposes the available parallelism in the application as well as opportunities for hiding the latency of any required communication. While asynchronous task launches and non-blocking data movement are fairly common in existing programming models, Realm makes all runtime operations asynchronous --- this includes resource management, performance feedback, and even, apparently paradoxically, synchronization primitives. Important design and implementation issues of Realm will be discussed, including the novel generational event data structure that allows Realm to efficiently and scalably handle a very large number of events in a distributed environment and the machine model that provides the information required for the mapping of a Legion application onto a system. Realm anticipates dynamic behavior of both future applications and future systems and includes mechanisms for application-directed profiling, fault reporting, and dynamic code generation that further improve performance portability by allowing an application to adapt to and optimize for the exact system configuration used for each run. Microbenchmarks demonstrate the efficiency and scalability of the Realm and justify some of the non-obvious design decisions (e.g. unfairness in locks). Experiments with several mini-apps are used to measure the benefit of a fully asynchronous runtime compared to existing ``non-blocking'' approaches. Finally, performance of Legion applications at full-scale show how Realm's composable asynchrony and support for heterogeneity benefit the overall Legion system on a variety of modern supercomputers.
|Type of resource
|electronic; electronic resource; remote
|1 online resource.
|Treichler, Sean Jeffrey
|Stanford University, Department of Computer Science.
|Hanrahan, P. M. (Patrick Matthew)
|Hanrahan, P. M. (Patrick Matthew)
|Statement of responsibility
|Sean Jeffrey Treichler.
|Submitted to the Department of Computer Science.
|Thesis (Ph.D.)--Stanford University, 2016.
- © 2016 by Sean Jeffrey Treichler
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...