A systematic framework to analyze the design space of DNN accelerators

Placeholder Show Content

Abstract/Contents

Abstract
Deep neural networks (DNNs) have been widely used to solve many modern machine intelligence problems. However, their outstanding accuracy comes at the cost of high computation complexity which limits their speed on conventional CPUs. This difficulty has encouraged researchers to create efficient DNN accelerators. Interestingly these designs use a variety of approaches, and have not converged over time. While each design states the advantages of its approach, without a comprehensive understanding of the global space, it is difficult to understand which design choices really matter. To address this issue, this thesis shows how the space of DNN hardware accelerators can be represented as scheduling choices, including computation orders, storage orders, etc. Different DNN micro-architectures and mappings represent specific choices of loop order and hardware parallelism for computing the seven nested loops of DNNs. This observation enables one to create a formal taxonomy of all existing dense DNN accelerators, and systematically analyze the design space, including dataflow choice. The loop transformations needed to create these hardware variants can be precisely and concisely represented by Halide's scheduling language. By modifying the Halide compiler to generate hardware, we create a system that can fairly compare these prior DNN accelerators, and show that many different dataflows yield similar energy efficiency with good performance. As long as properly choosing the memory sizes to accommodate the efficient blocking strategy, it can ensure that most data references stay on-chip with good locality and the function units have high resource utilization. Thus, the hardware dataflow choices become less critical, but how resources are allocated, especially in the memory system, has a large impact on energy and performance. By optimizing hardware resource allocation while keeping throughput constant, we achieve up to 4.2X energy improvement for Convolutional Neural Networks (CNNs), 1.6X and 1.8X improvement for Long Short-Term Memories (LSTMs) and multi-layer perceptrons (MLPs), respectively

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2019; ©2019
Publication date 2019; 2019
Issuance monographic
Language English

Creators/Contributors

Author Yang, Xuan, (Researcher on computer accelerator design)
Degree supervisor Horowitz, Mark (Mark Alan)
Thesis advisor Horowitz, Mark (Mark Alan)
Thesis advisor Fatahalian, Kayvon
Thesis advisor Kozyrakis, Christoforos, 1974-
Degree committee member Fatahalian, Kayvon
Degree committee member Kozyrakis, Christoforos, 1974-
Associated with Stanford University, Department of Electrical Engineering.

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Xuan Yang
Note Submitted to the Department of Electrical Engineering
Thesis Thesis Ph.D. Stanford University 2019
Location electronic resource

Access conditions

Copyright
© 2019 by Xuan Yang
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...