Methods for energy-efficient compute and communication in deep learning

Lee, Edward H

Methods for energy-efficient compute and communication in deep learning

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fxt660yc4203" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Deep neural networks (DNNs) have recently achieved state-of-the-art accuracies on visual and speech recognition. However, DNNs require millions of weights and activations for communication, and billions of compute operations. Both the design of compute and communication influence the energy-efficiency of DNN hardware. We present ways to improve the communication overhead and compute for DNN inference. To improve communication overhead, we propose low-precision logarithmic encoding (LogNet) on non-uniform weights with minimal loss in classification accuracy as compared with conventional fixed-point approaches. For example, even without retraining on Inception-Resnet, logarithmic encoding with 4 bits achieves only 3% degradation in top-5 accuracy. For both the encoding of weights and activations, we find that distortion error can be a very useful metric in the training of quantized DNNs. We propose dynamic clipping and dynamic resolutions on activations. We show that with all of our approaches, an approximate 2-bit complete ResNet-50 can achieve 70% (state-of-the-art) top-1 accuracy. Furthermore, with proposed techniques on 4-bit weights and activations, we achieve less than 1% degradation in top-1 accuracy on ImageNet and little degradation in object detection tasks. To improve compute, we propose and implement two strategies: 1) bitshift-add, which is motivated by LogNet, to replace expensive multipliers, and 2) passive charge-domain computation. In our second approach, we design and fabricate a chip that demonstrates energy-efficient matrix operations, called the Switched-Capacitor Matrix Multiplier (SCMM) in 40 nanometer CMOS. By replacing expensive digital multipliers and accumulate-registers with 40-nanometer switches and 300 attofarad unit capacitors, the SCMM achieves an energy per bit of 11 femtojoule or 5x lower energy than digital at 2.5 GHz. Finally, we briefly conclude to expand our findings to show how LogNet can also achieve more graceful model compression over conventional with limited retraining in data-sensitive environments.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2019; ©2019
Publication date	2019; 2019
Issuance	monographic
Language	English

Creators/Contributors

Author	Lee, Edward H
Degree supervisor	Wong, S. C
Thesis advisor	Wong, S. C
Thesis advisor	Murmann, Boris
Thesis advisor	Yeom, Kristen
Degree committee member	Murmann, Boris
Degree committee member	Yeom, Kristen
Associated with	Stanford University, Department of Electrical Engineering.

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Edward H. Lee.
Note	Submitted to the Department of Electrical Engineering.
Thesis	Thesis Ph.D. Stanford University 2019.
Location	electronic resource

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...