A systematic framework to analyze the design space of DNN accelerators

Yang, Xuan, (Researcher on computer accelerator design)

A systematic framework to analyze the design space of DNN accelerators

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fhm348kr8421" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Deep neural networks (DNNs) have been widely used to solve many modern machine intelligence problems. However, their outstanding accuracy comes at the cost of high computation complexity which limits their speed on conventional CPUs. This difficulty has encouraged researchers to create efficient DNN accelerators. Interestingly these designs use a variety of approaches, and have not converged over time. While each design states the advantages of its approach, without a comprehensive understanding of the global space, it is difficult to understand which design choices really matter. To address this issue, this thesis shows how the space of DNN hardware accelerators can be represented as scheduling choices, including computation orders, storage orders, etc. Different DNN micro-architectures and mappings represent specific choices of loop order and hardware parallelism for computing the seven nested loops of DNNs. This observation enables one to create a formal taxonomy of all existing dense DNN accelerators, and systematically analyze the design space, including dataflow choice. The loop transformations needed to create these hardware variants can be precisely and concisely represented by Halide's scheduling language. By modifying the Halide compiler to generate hardware, we create a system that can fairly compare these prior DNN accelerators, and show that many different dataflows yield similar energy efficiency with good performance. As long as properly choosing the memory sizes to accommodate the efficient blocking strategy, it can ensure that most data references stay on-chip with good locality and the function units have high resource utilization. Thus, the hardware dataflow choices become less critical, but how resources are allocated, especially in the memory system, has a large impact on energy and performance. By optimizing hardware resource allocation while keeping throughput constant, we achieve up to 4.2X energy improvement for Convolutional Neural Networks (CNNs), 1.6X and 1.8X improvement for Long Short-Term Memories (LSTMs) and multi-layer perceptrons (MLPs), respectively

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2019; ©2019
Publication date	2019; 2019
Issuance	monographic
Language	English

Creators/Contributors

Author	Yang, Xuan, (Researcher on computer accelerator design)
Degree supervisor	Horowitz, Mark (Mark Alan)
Thesis advisor	Horowitz, Mark (Mark Alan)
Thesis advisor	Fatahalian, Kayvon
Thesis advisor	Kozyrakis, Christoforos, 1974-
Degree committee member	Fatahalian, Kayvon
Degree committee member	Kozyrakis, Christoforos, 1974-
Associated with	Stanford University, Department of Electrical Engineering.

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Xuan Yang
Note	Submitted to the Department of Electrical Engineering
Thesis	Thesis Ph.D. Stanford University 2019
Location	electronic resource

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...