Low energy inference for always-on tinyML applications

Doshi, Rohan

Low energy inference for always-on tinyML applications

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fnr286tt5779" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Internet of Things (IoT) devices and sensors are generating data at an unprecedented scale. Processing all of this data in the cloud is infeasible due to the energy and latency cost of sending it over long distances. tinyML seeks to deploy deep neural networks (DNNs) on low-cost IoT devices in order to alleviate these challenges by distilling the data into just the necessary and useful information. This is particularly crucial in always-on scenarios where tinyML systems continuously analyze sensor data, necessitating low-energy processing to extend battery life and minimize overall energy consumption. This thesis studies the optimization of inference energy for always-on tinyML processors, covering insights across the stack from the network architecture, to the hardware architecture, to the circuit design of memories and compute. Our contributions include: (1) A study on tinyML network architectures and precision, assessing model size, accuracy, energy, and latency trade-offs, validated by silicon measurements. (2) 6T-latch-based Inner Loop Memories (ILMs) optimized for tinyML with an access cost of only 15 fJ/Byte. (3) A Pipelined Pixel Streaming (PPS) system architecture and dataflow that leverages ILMs to reduce memory access overhead for memory-intensive bottleneck layers. (4) Full-custom bit-serial multipliers and adder trees integrated with ILMs to reduce compute area and energy. These contributions culminate in Medusa, a programmable 8b digital processor fabricated in 28 nm CMOS for always-on tinyML applications that achieves an image-to-class inference energy of 0.83 (4.6) µJ at an accuracy of 86.2% (91.6%) on CIFAR-10 with all parameters stored on chip. Medusa achieves a 3.0x lower energy per inference at iso-latency when compared with the current state of the art. Medusa achieves the goal of sub-nJ-per-pixel always-on tinyML inference across the Image Recognition, Visual Wake Words, and Audio Keyword Spotting workloads from the MLPerf Tiny Inference Benchmark deployed in this work.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2024; ©2024
Publication date	2024; 2024
Issuance	monographic
Language	English

Creators/Contributors

Author	Doshi, Rohan
Degree supervisor	Horowitz, Mark (Mark Alan)
Degree supervisor	Murmann, Boris
Thesis advisor	Horowitz, Mark (Mark Alan)
Thesis advisor	Murmann, Boris
Thesis advisor	Arbabian, Amin
Degree committee member	Arbabian, Amin
Associated with	Stanford University, School of Engineering
Associated with	Stanford University, Department of Electrical Engineering

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Rohan Doshi.
Note	Submitted to the Department of Electrical Engineering.
Thesis	Thesis Ph.D. Stanford University 2024.
Location	https://purl.stanford.edu/nr286tt5779

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...