Locality-aware task management on many-core processors

Placeholder Show Content

Abstract/Contents

Abstract
The landscape of computing is changing. Due to limits in transistor scaling, the traditional approach to exploit instruction-level parallelism through wide-issue out-of-order execution cores provided diminishing performance gains. As a result, computer architects now rely on thread-level parallelism to obtain sustainable performance improvement. In particular, many-core processors are designed to exploit parallelism by implementing multiple cores that can execute in parallel. Both industry and academia agree that scaling the number of cores to hundreds or thousands is the only way to scale performance from now on. However, such a shift in design increases processor system demands. As a result, the cache hierarchies on many-core processors are becoming larger and increasingly complex. Such cache hierarchies suffer from high latency and energy consumption, and non-uniform memory access effects become prevalent. Traditionally, exploiting locality was an option to reduce execution time and energy consumption. On the complex many-core cache hierarchy, however, failing to exploit locality may end up having more cores stalled, thereby undermining the very viability of parallelism. Locality can be exploited at various hardware and software layers. By implementing private and shared caches in a multi-level fashion, recent hardware designs are already optimized for locality. However, this would all be useless if the software scheduling does not cast the execution in a manner that promotes locality available in the programs themselves. Especially, the recent proliferation of runtime-based programming systems further stresses the importance of locality-aware scheduling. Although many efforts have been made to exploit locality on a runtime, they fail to take the underlying cache hierarchy into consideration, are limited to specific programming models, and suffer high management costs. This thesis shows that locality-aware schedules can be generated at low costs by utilizing high-level information. In particular, by optimizing a MapReduce runtime on a multi-socket many-core system, we show that runtimes can leverage explicit producer-consumer information to exploit locality. Specifically, the locality on the data structures that buffer intermediate results becomes significantly important. In addition, the optimization should be performed across all the software layers. To handle the case where the explicit data dependency information is not available, we develop a graph-based locality analysis framework that allows to analyze key scheduling attributes while being independent of hardware specifics and scale. Using the framework, we also develop a reference scheduling scheme that shows significant performance improvement and energy savings. We then develop a novel class of practical locality-aware task managers, that leverage workload pattern information and simple locality hints to approximate the reference scheduling scheme. Through experiments, we show that the quality of generated schedules can match that of the reference scheme, and that the schedule generation costs are minimal. While exploiting significant locality, these managers maintain the simple task programming interface intact. We also point out that task stealing can be made compatible with locality-aware scheduling. Traditional task management schemes believed there exists a fundamental tradeoff between locality and load balance, and fixated on one to sacrifice the other. We show that a stealing scheme can be made locality-aware, by trying to preserve the original schedule while transferring tasks for load balancing. In summary, utilizing high-level information allows the construction of efficient locality-aware task management schemes that make programs run faster while consuming less energy.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2012
Issuance monographic
Language English

Creators/Contributors

Associated with Yoo, Richard Myungon
Associated with Stanford University, Department of Electrical Engineering
Primary advisor Kozyrakis, Christoforos, 1974-
Primary advisor Olukotun, Oyekunle Ayinde
Thesis advisor Kozyrakis, Christoforos, 1974-
Thesis advisor Olukotun, Oyekunle Ayinde
Thesis advisor Rosenblum, Mendel
Advisor Rosenblum, Mendel

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Richard Myungon Yoo.
Note Submitted to the Department of Electrical Engineering.
Thesis Thesis (Ph.D.)--Stanford University, 2012.
Location electronic resource

Access conditions

Copyright
© 2012 by Richard Myungon Yoo

Also listed in

Loading usage metrics...