Methods for energy-efficient compute and communication in deep learning

Placeholder Show Content

Abstract/Contents

Abstract
Deep neural networks (DNNs) have recently achieved state-of-the-art accuracies on visual and speech recognition. However, DNNs require millions of weights and activations for communication, and billions of compute operations. Both the design of compute and communication influence the energy-efficiency of DNN hardware. We present ways to improve the communication overhead and compute for DNN inference. To improve communication overhead, we propose low-precision logarithmic encoding (LogNet) on non-uniform weights with minimal loss in classification accuracy as compared with conventional fixed-point approaches. For example, even without retraining on Inception-Resnet, logarithmic encoding with 4 bits achieves only 3% degradation in top-5 accuracy. For both the encoding of weights and activations, we find that distortion error can be a very useful metric in the training of quantized DNNs. We propose dynamic clipping and dynamic resolutions on activations. We show that with all of our approaches, an approximate 2-bit complete ResNet-50 can achieve 70% (state-of-the-art) top-1 accuracy. Furthermore, with proposed techniques on 4-bit weights and activations, we achieve less than 1% degradation in top-1 accuracy on ImageNet and little degradation in object detection tasks. To improve compute, we propose and implement two strategies: 1) bitshift-add, which is motivated by LogNet, to replace expensive multipliers, and 2) passive charge-domain computation. In our second approach, we design and fabricate a chip that demonstrates energy-efficient matrix operations, called the Switched-Capacitor Matrix Multiplier (SCMM) in 40 nanometer CMOS. By replacing expensive digital multipliers and accumulate-registers with 40-nanometer switches and 300 attofarad unit capacitors, the SCMM achieves an energy per bit of 11 femtojoule or 5x lower energy than digital at 2.5 GHz. Finally, we briefly conclude to expand our findings to show how LogNet can also achieve more graceful model compression over conventional with limited retraining in data-sensitive environments.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2019; ©2019
Publication date 2019; 2019
Issuance monographic
Language English

Creators/Contributors

Author Lee, Edward H
Degree supervisor Wong, S. C
Thesis advisor Wong, S. C
Thesis advisor Murmann, Boris
Thesis advisor Yeom, Kristen
Degree committee member Murmann, Boris
Degree committee member Yeom, Kristen
Associated with Stanford University, Department of Electrical Engineering.

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Edward H. Lee.
Note Submitted to the Department of Electrical Engineering.
Thesis Thesis Ph.D. Stanford University 2019.
Location electronic resource

Access conditions

Copyright
© 2019 by Edward Heesung Lee
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...