Energy, latency, and silicon area trade-off analysis for tinyML compute architectures

Giordano, Massimo

Energy, latency, and silicon area trade-off analysis for tinyML compute architectures

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fpn872jn5546" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: This dissertation explores the development of energy-efficient Deep Neural Network (DNN) accelerators for edge computing, emphasizing real-time inference with strict power constraints. Addressing the shift from cloud-based systems to edge devices, it tackles the embedding of DNN models within devices constrained by memory and processing capacity. The research introduces the Tiny Machine Learning (tinyML) paradigm and custom hardware to support both AI inference and on-device AI training. Focusing on energy-conscious computation in devices powered by limited energy sources like coin-cell batteries, a system architecture is proposed. It balances continuous and event-triggered DNN models to optimize power usage, aiming to operate within an 80-microwatt average power budget to sustain a one-year battery life. The dissertation presents innovative AI device architectures, considering compute elements, memory hierarchies, and dataflows. It details advances in ML algorithms, near-memory computing, and system-level efficiency, addressing data movement bottlenecks. A holistic approach seeks architectures that strike an optimal silicon area and energy efficiency balance, advancing the performance-cost paradigm of tinyML accelerators. The research outlines the design space of tinyML hardware accelerators, the benefits of Non-Volatile memories, custom-designed latch arrays (CLAs), and multicore systems tailored for tinyML. It discusses future directions like larger model integration and heterogeneous memory systems for improved performance. The goal is to facilitate cost-effective, efficient DNN accelerators for small batteries, capable of adapting and learning in their environments. The dissertation introduces CHIMERA, a silicon AI accelerator implemented in 40 nm ULP CMOS, integrating 2 MB on-chip RRAM for DNN weight storage and a RISC-V core. CHIMERA demonstrates exceptional power efficiency, with 126 mW consumption at 1.1 V and 200 MHz, and achieves 2.2 TOPS/W energy efficiency for ResNet-18. It explores power gating and multi-chip performance and proposes a Low-Rank Training algorithm to overcome RRAM's high write energy and latency. To study the trade-offs in hardware accelerators specifically for tinyML applications and achieve sub-microjoule per inference—crucial for persistent sensor data monitoring in IoT—a design space exploration framework named tinyForge has been devised. TinyForge demonstrates the advantages of Custom Latch Arrays (CLAs), which boast 2.7 times higher density and require 5 times less read-access energy compared to traditional synthesized approaches. Leveraging the NSGA-II genetic algorithm, tinyForge conducts a multi-objective optimization to scrutinize the balance between energy, latency, and silicon area, effectively pinpointing Pareto-optimal architectures. This streamlined process culminates in identifying optimal solutions with fewer than a thousand evaluations, significantly expediting the design cycle and leading to time-efficient advancements in tinyML technology. Building upon the insights presented by the tinyForge platform, Medusa stands out as a cutting-edge accelerator tailored for tinyML applications. Medusa integrates the advances in Custom Latch Arrays (CLAs) identified by tinyForge, which offer up to 2.7 times higher density and consume 5 times less read-access energy than conventional synthesized methods. This integration facilitates a significant reduction in memory demands, with Medusa achieving a 4x reduction in memory capacity and a 7x decrease in memory access. Validated on a 28 nm CMOS test-chip, Medusa demonstrates remarkable energy efficiency with less than 0.83 uJ per CIFAR-10 inference, substantially surpassing the performance of existing solutions. The Medusa accelerator's ability to perform full workload execution on edge devices not only confirms its excellence for embedded tinyML tasks but also its preparedness for real-world implementations.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2023; ©2023
Publication date	2023; 2023
Issuance	monographic
Language	English

Creators/Contributors

Author	Giordano, Massimo
Degree supervisor	Mitra, Subhasish
Thesis advisor	Mitra, Subhasish
Thesis advisor	Murmann,Boris
Thesis advisor	Raina, Priyanka
Degree committee member	Murmann,Boris
Degree committee member	Raina, Priyanka
Associated with	Stanford University, School of Engineering
Associated with	Stanford University, Department of Electrical Engineering

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Massimo Giordano.
Note	Submitted to the Department of Electrical Engineering.
Thesis	Thesis Ph.D. Stanford University 2023.
Location	https://purl.stanford.edu/pn872jn5546

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...