Brain-inspired computing with resistive non-volatile memories
- Training of large scale neural networks require thousands of processors, massive amounts of off-chip memory, and kilowatts of power to complete within a reasonable amount of time. The bulk of this power is consumed in moving data between the fast multicore processors and off-chip memory. After trained in backend infrastructure, model parameters can be deployed on mobile devices for inference tasks possibly combined with further training for local customization. However, off-chip memory accesses also dominate required power in mobile devices, limiting the size and scale of neural network models deployed on mobile devices by the amount of on-chip memory available, which is limited due to the large area of SRAM. For both training and inference tasks, resistive switching memories offer a compact, scalable and low power alternative that permits on-chip co-located processing and memory in fine-grain distributed parallel architecture. In this dissertation, approaches to utilizing resistive memories in analog fashion as synaptic weights in neural networks are described and analyzed. First, proof-of-concept demonstrations of training different types of neural networks with noisy energy efficient analog weight updates are presented using fabricated resistive memory devices. Effects of device variation on energy efficiency and training accuracy are measured. Then, using experimentally validated compact model of metal-oxide resistive switching memory, effects of device non-idealities on training accuracy as well as energy efficiency is studied using a single-layer neural network (Restricted Boltzmann Machine) as a case study. Requirements for resistive memories as well as programming schemes for mitigating the effects of device non-idealities are elucidated. An architecture with resistive memories optimized for inference tasks, Efficient Resistive Neural Inference Engine, is also presented. This architecture combines algorithm and hardware co-optimizations by using pruning, quantization and noise injection on algorithm level, and low power low-precision analog and digital circuits on circuit level. Architecture is then benchmarked against fully digital CMOS system. Resistive technology outperforms digital CMOS by ~8× if a cell can store a weight with ~6-bit precision and if activations can tolerate 1 or 2-bit precision. If a cell cannot store more than 2 bits or if activations require > 3-bit precision, benefits diminish significantly.
|Type of resource
|electronic; electronic resource; remote
|1 online resource.
|Eryilmaz, Sukru Burc
|Stanford University, Department of Electrical Engineering.
|Wong, Hon-Sum Philip, 1959-
|Wong, Hon-Sum Philip, 1959-
|Statement of responsibility
|Sukru Burc Eryilmaz.
|Submitted to the Department of Electrical Engineering.
|Thesis (Ph.D.)--Stanford University, 2017.
- © 2017 by Sukru Burc Eryilmaz
Also listed in
Loading usage metrics...