Application optimized computing

Qadeer, Wajahat; Stanford University, Department of Electrical Engineering.

Application optimized computing

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fxz888kk6027" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: All computing systems are power limited, whether it is the 1W limit of a cell phone, or the 100W limit of a server. Triggered by the end of voltage scaling, the restricted power budgets are the result of energy per transistor switch scaling slower than the number of transistors on the chip. Since technology scaling no longer provides the energy savings it once did, to scale performance (operations/sec), we must improve the energy per operation by reducing the number of transistors involved in each operation. This fundamental change is forcing the design community to find new approaches to energy efficient computing. Most designs use pre-designed processor based solutions because of their flexibility and availability. However, they are usually not the most energy efficient solutions. To better understand the potential of producing general-purpose chips with better efficiency, this thesis tries to analyze in detail the types of inefficiencies that exist in general-purpose systems - designs that can be outclassed by up to 3 orders of magnitude in both performance and energy-efficiency by ASIC designs. To collect this data, we classify applications using the dominant sources of energy: compute, control and memory. For compute and control bound applications we gather this data by first identifying the types and magnitudes of energy overheads that exist in a general-purpose Tensilica based extensible RISC chip multiprocessor (CMP) system and then by exploring the architectural support and customizations needed to transform a general-purpose system to have the same energy efficiency as an ASIC. Because the fundamental operations in compute bound applications are generally very low-power, amortization of overheads introduced by programmability requires execution of hundreds of these operations in one cycle. Interestingly, a high percentage of compute bound applications share common data-flow characteristics, which we exploit to create a flexible yet efficient domain specific processor, called the Convolution Engine. Although, control bound applications also operate on low-power control flow operations, sequential dependencies restrict the number of control flow operations fuseable into one instruction to between ten and fifteen. This restriction also defines the extent of achievable efficiency for control bound applications. Unlike the low-power operations abundant in compute and control bound applications, the fundamental cost of a memory fetch is considerable. Improving the system efficiency of memory bound applications not only requires improving the efficiency of the processing elements, but also requires substantially increasing reuse in data fetches.

Description

Type of resource	text
Form	electronic; electronic resource; remote
Extent	1 online resource.
Publication date	2013
Issuance	monographic
Language	English

Creators/Contributors

Associated with	Qadeer, Wajahat
Associated with	Stanford University, Department of Electrical Engineering.
Primary advisor	Horowitz, Mark (Mark Alan)
Thesis advisor	Horowitz, Mark (Mark Alan)
Thesis advisor	Kozyrakis, Christoforos, 1974-
Thesis advisor	Richardson, Stephen A
Advisor	Kozyrakis, Christoforos, 1974-
Advisor	Richardson, Stephen A

Subjects

Genre	Theses

Bibliographic information

Statement of responsibility	Wajahat Qadeer.
Note	Submitted to the Department of Electrical Engineering.
Thesis	Thesis (Ph.D.)--Stanford University, 2013.
Location	electronic resource

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...