Application optimized computing

Placeholder Show Content

Abstract/Contents

Abstract
All computing systems are power limited, whether it is the 1W limit of a cell phone, or the 100W limit of a server. Triggered by the end of voltage scaling, the restricted power budgets are the result of energy per transistor switch scaling slower than the number of transistors on the chip. Since technology scaling no longer provides the energy savings it once did, to scale performance (operations/sec), we must improve the energy per operation by reducing the number of transistors involved in each operation. This fundamental change is forcing the design community to find new approaches to energy efficient computing. Most designs use pre-designed processor based solutions because of their flexibility and availability. However, they are usually not the most energy efficient solutions. To better understand the potential of producing general-purpose chips with better efficiency, this thesis tries to analyze in detail the types of inefficiencies that exist in general-purpose systems - designs that can be outclassed by up to 3 orders of magnitude in both performance and energy-efficiency by ASIC designs. To collect this data, we classify applications using the dominant sources of energy: compute, control and memory. For compute and control bound applications we gather this data by first identifying the types and magnitudes of energy overheads that exist in a general-purpose Tensilica based extensible RISC chip multiprocessor (CMP) system and then by exploring the architectural support and customizations needed to transform a general-purpose system to have the same energy efficiency as an ASIC. Because the fundamental operations in compute bound applications are generally very low-power, amortization of overheads introduced by programmability requires execution of hundreds of these operations in one cycle. Interestingly, a high percentage of compute bound applications share common data-flow characteristics, which we exploit to create a flexible yet efficient domain specific processor, called the Convolution Engine. Although, control bound applications also operate on low-power control flow operations, sequential dependencies restrict the number of control flow operations fuseable into one instruction to between ten and fifteen. This restriction also defines the extent of achievable efficiency for control bound applications. Unlike the low-power operations abundant in compute and control bound applications, the fundamental cost of a memory fetch is considerable. Improving the system efficiency of memory bound applications not only requires improving the efficiency of the processing elements, but also requires substantially increasing reuse in data fetches.

Description

Type of resource text
Form electronic; electronic resource; remote
Extent 1 online resource.
Publication date 2013
Issuance monographic
Language English

Creators/Contributors

Associated with Qadeer, Wajahat
Associated with Stanford University, Department of Electrical Engineering.
Primary advisor Horowitz, Mark (Mark Alan)
Thesis advisor Horowitz, Mark (Mark Alan)
Thesis advisor Kozyrakis, Christoforos, 1974-
Thesis advisor Richardson, Stephen A
Advisor Kozyrakis, Christoforos, 1974-
Advisor Richardson, Stephen A

Subjects

Genre Theses

Bibliographic information

Statement of responsibility Wajahat Qadeer.
Note Submitted to the Department of Electrical Engineering.
Thesis Thesis (Ph.D.)--Stanford University, 2013.
Location electronic resource

Access conditions

Copyright
© 2013 by Wajahat Qadeer
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...