RRAM-CMOS integrated hardware for efficient learning and inference at the edge
Abstract/Contents
- Abstract
- Ubiquitous artificial intelligence (AI) promises to empower broad edge applications, from health-monitoring and augmented-reality wearables to autonomous robots. To fulfill the big promises, future electronics must deliver unprecedented energy efficiency with new functionalities, enabling real-time adaptation and lifelong learning at the edge. Such growing demands simply cannot be met by isolated advancements in materials, devices, integrated circuits, and architectures, but instead require cross-layer design for the next-generation AI hardware. And nanotechnology has much to offer in terms of new materials, new devices, and new integration process technologies. In this dissertation, as a case study for nanotechnology-inspired AI hardware, I present my work on resistive RAM (RRAM)-CMOS integrated hardware for efficient learning and inference at the edge. The overarching methodology developed and illustrated throughout this dissertation, is to expose and connect the unique properties of nanotechnologies at device and circuit levels all the way to the diverse AI model characteristics. The resulting building blocks are termed as "nano-kernels". Through the work described through Chapter 2 to Chapter 4 focusing on RRAM as an example of nanotechnologies, I discuss the cross-layer design, integration, and creation of new compute kernels and chip architectures, along with physics-backed modeling and design explorations. I present SAPIENS, the first integrated chip that enables on-chip, one-shot learning with scarce and never-before-seen data, built with 65,536 RRAMs and mixed-signal silicon CMOS in a 40-nm process. I describe energy-area-efficient associative memory design with SAPIENS, leveraging the high-density nano-kernels and the cross-layer design supporting memory-augmented neural networks (MANN) workloads with nature of approximation. SAPIENS achieves software-comparable accuracies based on real-time chip measurements and application-level robustness with over 6 million images tested. The non-volatile nature of SAPIENS with zero standby power, the high density and high-bandwidth parallel access through technology integration, and the scalability provided by the spatial architecture, together show a viable path towards life-long learning at the edge. Towards new AI paradigms beyond neural networks, I then present the first experimental demonstration of hyper-dimensional (HD) computing with 3D vertical RRAM (VRRAM) as nano-kernels. I demonstrate that by exploiting the convergence of physical and algorithm characteristics, the intrinsic device physics and the native 3D connectivity of VRRAM with silicon transistors can be seamlessly translated into natural realization of 3D nano-kernels for HD computing. I describe how in-memory random projection is directly enabled by the inherent stochasticity of RRAM, experimentally characterized by a nonlinear voltage-time relationship in the sub-threshold regime. I then demonstrate the complete multiply-accumulate-permute (MAP) kernels, experimentally realized within 3D VRRAMs with robust cycle-to-cycle operations and immunity to device-to-device variations. I show that algorithm-hardware co-design provides opportunities for energy-accuracy trade-offs and that memory-centric HD systems can be extremely resilient to hardware errors. Equally important, I discuss my modeling work centering around RRAM technology across device, circuit, and system levels for physics-backed design explorations. At the device level, I develop an experimentally-calibrated RRAM SPICE model with a hierarchy of three model levels. At the circuit level, I describe a full-array Monte Carlo modeling framework for variation-aware RRAM circuit simulations to study how the intrinsic cell variations are translated into circuit behaviors and impacting design choices. At the system level, I present technology-system design space explorations analyzing on-chip memory technologies for deep neural network (DNN) accelerator designs. I show that dense memory-compute integration provides ample opportunities towards optimal chip resource partition.
Description
Type of resource | text |
---|---|
Form | electronic resource; remote; computer; online resource |
Extent | 1 online resource. |
Place | California |
Place | [Stanford, California] |
Publisher | [Stanford University] |
Copyright date | 2022; ©2022 |
Publication date | 2022; 2022 |
Issuance | monographic |
Language | English |
Creators/Contributors
Author | Li, Haitong | |
---|---|---|
Degree supervisor | Wong, Hon-Sum Philip, 1959- | |
Thesis advisor | Wong, Hon-Sum Philip, 1959- | |
Thesis advisor | Mitra, Subhasish | |
Thesis advisor | Raina, Priyanka, (Assistant Professor of Electrical Engineering) | |
Degree committee member | Mitra, Subhasish | |
Degree committee member | Raina, Priyanka, (Assistant Professor of Electrical Engineering) | |
Associated with | Stanford University, Department of Electrical Engineering |
Subjects
Genre | Theses |
---|---|
Genre | Text |
Bibliographic information
Statement of responsibility | Haitong Li. |
---|---|
Note | Submitted to the Department of Electrical Engineering. |
Thesis | Thesis Ph.D. Stanford University 2022. |
Location | https://purl.stanford.edu/mb108yd4397 |
Access conditions
- Copyright
- © 2022 by Haitong Li
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...