Methods for energy-efficient compute and communication in deep learning
Abstract/Contents
- Abstract
- Deep neural networks (DNNs) have recently achieved state-of-the-art accuracies on visual and speech recognition. However, DNNs require millions of weights and activations for communication, and billions of compute operations. Both the design of compute and communication influence the energy-efficiency of DNN hardware. We present ways to improve the communication overhead and compute for DNN inference. To improve communication overhead, we propose low-precision logarithmic encoding (LogNet) on non-uniform weights with minimal loss in classification accuracy as compared with conventional fixed-point approaches. For example, even without retraining on Inception-Resnet, logarithmic encoding with 4 bits achieves only 3% degradation in top-5 accuracy. For both the encoding of weights and activations, we find that distortion error can be a very useful metric in the training of quantized DNNs. We propose dynamic clipping and dynamic resolutions on activations. We show that with all of our approaches, an approximate 2-bit complete ResNet-50 can achieve 70% (state-of-the-art) top-1 accuracy. Furthermore, with proposed techniques on 4-bit weights and activations, we achieve less than 1% degradation in top-1 accuracy on ImageNet and little degradation in object detection tasks. To improve compute, we propose and implement two strategies: 1) bitshift-add, which is motivated by LogNet, to replace expensive multipliers, and 2) passive charge-domain computation. In our second approach, we design and fabricate a chip that demonstrates energy-efficient matrix operations, called the Switched-Capacitor Matrix Multiplier (SCMM) in 40 nanometer CMOS. By replacing expensive digital multipliers and accumulate-registers with 40-nanometer switches and 300 attofarad unit capacitors, the SCMM achieves an energy per bit of 11 femtojoule or 5x lower energy than digital at 2.5 GHz. Finally, we briefly conclude to expand our findings to show how LogNet can also achieve more graceful model compression over conventional with limited retraining in data-sensitive environments.
Description
Type of resource | text |
---|---|
Form | electronic resource; remote; computer; online resource |
Extent | 1 online resource. |
Place | California |
Place | [Stanford, California] |
Publisher | [Stanford University] |
Copyright date | 2019; ©2019 |
Publication date | 2019; 2019 |
Issuance | monographic |
Language | English |
Creators/Contributors
Author | Lee, Edward H |
---|---|
Degree supervisor | Wong, S. C |
Thesis advisor | Wong, S. C |
Thesis advisor | Murmann, Boris |
Thesis advisor | Yeom, Kristen |
Degree committee member | Murmann, Boris |
Degree committee member | Yeom, Kristen |
Associated with | Stanford University, Department of Electrical Engineering. |
Subjects
Genre | Theses |
---|---|
Genre | Text |
Bibliographic information
Statement of responsibility | Edward H. Lee. |
---|---|
Note | Submitted to the Department of Electrical Engineering. |
Thesis | Thesis Ph.D. Stanford University 2019. |
Location | electronic resource |
Access conditions
- Copyright
- © 2019 by Edward Heesung Lee
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...