Low energy inference for always-on tinyML applications

Placeholder Show Content

Abstract/Contents

Abstract
Internet of Things (IoT) devices and sensors are generating data at an unprecedented scale. Processing all of this data in the cloud is infeasible due to the energy and latency cost of sending it over long distances. tinyML seeks to deploy deep neural networks (DNNs) on low-cost IoT devices in order to alleviate these challenges by distilling the data into just the necessary and useful information. This is particularly crucial in always-on scenarios where tinyML systems continuously analyze sensor data, necessitating low-energy processing to extend battery life and minimize overall energy consumption. This thesis studies the optimization of inference energy for always-on tinyML processors, covering insights across the stack from the network architecture, to the hardware architecture, to the circuit design of memories and compute. Our contributions include: (1) A study on tinyML network architectures and precision, assessing model size, accuracy, energy, and latency trade-offs, validated by silicon measurements. (2) 6T-latch-based Inner Loop Memories (ILMs) optimized for tinyML with an access cost of only 15 fJ/Byte. (3) A Pipelined Pixel Streaming (PPS) system architecture and dataflow that leverages ILMs to reduce memory access overhead for memory-intensive bottleneck layers. (4) Full-custom bit-serial multipliers and adder trees integrated with ILMs to reduce compute area and energy. These contributions culminate in Medusa, a programmable 8b digital processor fabricated in 28 nm CMOS for always-on tinyML applications that achieves an image-to-class inference energy of 0.83 (4.6) µJ at an accuracy of 86.2% (91.6%) on CIFAR-10 with all parameters stored on chip. Medusa achieves a 3.0x lower energy per inference at iso-latency when compared with the current state of the art. Medusa achieves the goal of sub-nJ-per-pixel always-on tinyML inference across the Image Recognition, Visual Wake Words, and Audio Keyword Spotting workloads from the MLPerf Tiny Inference Benchmark deployed in this work.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2024; ©2024
Publication date 2024; 2024
Issuance monographic
Language English

Creators/Contributors

Author Doshi, Rohan
Degree supervisor Horowitz, Mark (Mark Alan)
Degree supervisor Murmann, Boris
Thesis advisor Horowitz, Mark (Mark Alan)
Thesis advisor Murmann, Boris
Thesis advisor Arbabian, Amin
Degree committee member Arbabian, Amin
Associated with Stanford University, School of Engineering
Associated with Stanford University, Department of Electrical Engineering

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Rohan Doshi.
Note Submitted to the Department of Electrical Engineering.
Thesis Thesis Ph.D. Stanford University 2024.
Location https://purl.stanford.edu/nr286tt5779

Access conditions

Copyright
© 2024 by Rohan P Doshi
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...