Low energy inference for always-on tinyML applications
Abstract/Contents
- Abstract
- Internet of Things (IoT) devices and sensors are generating data at an unprecedented scale. Processing all of this data in the cloud is infeasible due to the energy and latency cost of sending it over long distances. tinyML seeks to deploy deep neural networks (DNNs) on low-cost IoT devices in order to alleviate these challenges by distilling the data into just the necessary and useful information. This is particularly crucial in always-on scenarios where tinyML systems continuously analyze sensor data, necessitating low-energy processing to extend battery life and minimize overall energy consumption. This thesis studies the optimization of inference energy for always-on tinyML processors, covering insights across the stack from the network architecture, to the hardware architecture, to the circuit design of memories and compute. Our contributions include: (1) A study on tinyML network architectures and precision, assessing model size, accuracy, energy, and latency trade-offs, validated by silicon measurements. (2) 6T-latch-based Inner Loop Memories (ILMs) optimized for tinyML with an access cost of only 15 fJ/Byte. (3) A Pipelined Pixel Streaming (PPS) system architecture and dataflow that leverages ILMs to reduce memory access overhead for memory-intensive bottleneck layers. (4) Full-custom bit-serial multipliers and adder trees integrated with ILMs to reduce compute area and energy. These contributions culminate in Medusa, a programmable 8b digital processor fabricated in 28 nm CMOS for always-on tinyML applications that achieves an image-to-class inference energy of 0.83 (4.6) µJ at an accuracy of 86.2% (91.6%) on CIFAR-10 with all parameters stored on chip. Medusa achieves a 3.0x lower energy per inference at iso-latency when compared with the current state of the art. Medusa achieves the goal of sub-nJ-per-pixel always-on tinyML inference across the Image Recognition, Visual Wake Words, and Audio Keyword Spotting workloads from the MLPerf Tiny Inference Benchmark deployed in this work.
Description
Type of resource | text |
---|---|
Form | electronic resource; remote; computer; online resource |
Extent | 1 online resource. |
Place | California |
Place | [Stanford, California] |
Publisher | [Stanford University] |
Copyright date | 2024; ©2024 |
Publication date | 2024; 2024 |
Issuance | monographic |
Language | English |
Creators/Contributors
Author | Doshi, Rohan |
---|---|
Degree supervisor | Horowitz, Mark (Mark Alan) |
Degree supervisor | Murmann, Boris |
Thesis advisor | Horowitz, Mark (Mark Alan) |
Thesis advisor | Murmann, Boris |
Thesis advisor | Arbabian, Amin |
Degree committee member | Arbabian, Amin |
Associated with | Stanford University, School of Engineering |
Associated with | Stanford University, Department of Electrical Engineering |
Subjects
Genre | Theses |
---|---|
Genre | Text |
Bibliographic information
Statement of responsibility | Rohan Doshi. |
---|---|
Note | Submitted to the Department of Electrical Engineering. |
Thesis | Thesis Ph.D. Stanford University 2024. |
Location | https://purl.stanford.edu/nr286tt5779 |
Access conditions
- Copyright
- © 2024 by Rohan P Doshi
- License
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...