Balancing efficiency and flexibility in specialized computing

Placeholder Show Content

Abstract/Contents

Abstract
CMOS-based integrated circuits have hit a power wall, and future performance increases cannot rely on increased power budgets. This means we need to create more energy efficient solutions if we want performance to continue to scale. A proven way to gain high efficiency is to build special-purpose ASIC chips for the application of interest. These designs can achieve 2-3 orders of magnitude higher energy efficiency and performance compared to general-purpose processors. However ASIC design has become prohibitively expensive making it difficult to justify the investments in design effort for all but the few applications with very large volumes and stable code bases. General-purpose processors amortize the design cost over a large number of applications, and provide standard development tools, resulting in higher productivity and lower development costs. However, this flexibility comes at a cost of much higher energy consumption. This thesis examines the tradeoff between flexibility and efficiency with an aim to develop architectures that combine the low energy consumption of customized units with the reusability of general-purpose processors. A number of approaches are already being tried to lower the energy consumption of programmable systems, such as a move to homogenous and heterogeneous multi-core systems, augmenting the processors with hardware accelerators, and creating application-specific processors. However our work takes a step back to first understand and quantify what makes a general-purpose processor so inefficient and whether it is at all possible to get close to ASIC efficiencies within a programmable framework. The insights from this work are then used as a basis to derive new architectural ideas for efficient execution. Specifically, we propose building domain customized functional units as a solution for balancing efficiency with flexibility. As a case study, we look at the domain of imaging and video processing. These workloads are becoming ubiquitous across all computing devices and have very high computing requirements often served by special purpose hardware. At the same time there are a large number of emerging applications in this domain with diverse requirements, so going forward there is a great need for flexible platforms for this domain. Thus it is an ideal candidate for our study. We demonstrate two programmable functional units for this domain - the Convolution Engine and the Bilateral Engine. A number of key computational motifs common to most applications in this domain can be implemented very efficiently using these engines. The resulting performance and efficiency is within 2-3x of custom designs but an order of magnitude better than general-purpose processors with data-parallel extensions such as SIMD units. We also argue that domain customized functional units demand a slight change in the mindset of system designers and application developers -- instead of always wanting to fit the hardware to algorithm requirements, we optimize a number of key computational motifs and then restructure our applications to make maximum use of these motifs. As an example, we look at modifying the \emph{bilateral filtering} algorithm - a key non-linear filter common to most computational photography applications - such that it is a good fit for the capabilities of our proposed hardware units. The resulting implementation provides over 50x energy reduction over the state of the art software implementation for this algorithm. Our work suggests that identifying key data-flows and computational motifs in a domain and creating efficient-yet-flexible domain customized functional units to optimize these motifs is a viable solution to address the energy consumption problem faced by designers today.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2013
Issuance monographic
Language English

Creators/Contributors

Associated with Hameed, Rehan
Associated with Stanford University, Department of Electrical Engineering.
Primary advisor Kozyrakis, Christoforos, 1974-
Thesis advisor Kozyrakis, Christoforos, 1974-
Thesis advisor Horowitz, Mark (Mark Alan)
Thesis advisor Richardson, Stephen
Advisor Horowitz, Mark (Mark Alan)
Advisor Richardson, Stephen

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Rehan Hameed.
Note Submitted to the Department of Electrical Engineering.
Thesis Thesis (Ph.D.)--Stanford University, 2013.
Location electronic resource

Access conditions

Copyright
© 2013 by Rehan Hameed
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...