Energy, latency, and silicon area trade-off analysis for tinyML compute architectures

Placeholder Show Content

Abstract/Contents

Abstract
This dissertation explores the development of energy-efficient Deep Neural Network (DNN) accelerators for edge computing, emphasizing real-time inference with strict power constraints. Addressing the shift from cloud-based systems to edge devices, it tackles the embedding of DNN models within devices constrained by memory and processing capacity. The research introduces the Tiny Machine Learning (tinyML) paradigm and custom hardware to support both AI inference and on-device AI training. Focusing on energy-conscious computation in devices powered by limited energy sources like coin-cell batteries, a system architecture is proposed. It balances continuous and event-triggered DNN models to optimize power usage, aiming to operate within an 80-microwatt average power budget to sustain a one-year battery life. The dissertation presents innovative AI device architectures, considering compute elements, memory hierarchies, and dataflows. It details advances in ML algorithms, near-memory computing, and system-level efficiency, addressing data movement bottlenecks. A holistic approach seeks architectures that strike an optimal silicon area and energy efficiency balance, advancing the performance-cost paradigm of tinyML accelerators. The research outlines the design space of tinyML hardware accelerators, the benefits of Non-Volatile memories, custom-designed latch arrays (CLAs), and multicore systems tailored for tinyML. It discusses future directions like larger model integration and heterogeneous memory systems for improved performance. The goal is to facilitate cost-effective, efficient DNN accelerators for small batteries, capable of adapting and learning in their environments. The dissertation introduces CHIMERA, a silicon AI accelerator implemented in 40 nm ULP CMOS, integrating 2 MB on-chip RRAM for DNN weight storage and a RISC-V core. CHIMERA demonstrates exceptional power efficiency, with 126 mW consumption at 1.1 V and 200 MHz, and achieves 2.2 TOPS/W energy efficiency for ResNet-18. It explores power gating and multi-chip performance and proposes a Low-Rank Training algorithm to overcome RRAM's high write energy and latency. To study the trade-offs in hardware accelerators specifically for tinyML applications and achieve sub-microjoule per inference—crucial for persistent sensor data monitoring in IoT—a design space exploration framework named tinyForge has been devised. TinyForge demonstrates the advantages of Custom Latch Arrays (CLAs), which boast 2.7 times higher density and require 5 times less read-access energy compared to traditional synthesized approaches. Leveraging the NSGA-II genetic algorithm, tinyForge conducts a multi-objective optimization to scrutinize the balance between energy, latency, and silicon area, effectively pinpointing Pareto-optimal architectures. This streamlined process culminates in identifying optimal solutions with fewer than a thousand evaluations, significantly expediting the design cycle and leading to time-efficient advancements in tinyML technology. Building upon the insights presented by the tinyForge platform, Medusa stands out as a cutting-edge accelerator tailored for tinyML applications. Medusa integrates the advances in Custom Latch Arrays (CLAs) identified by tinyForge, which offer up to 2.7 times higher density and consume 5 times less read-access energy than conventional synthesized methods. This integration facilitates a significant reduction in memory demands, with Medusa achieving a 4x reduction in memory capacity and a 7x decrease in memory access. Validated on a 28 nm CMOS test-chip, Medusa demonstrates remarkable energy efficiency with less than 0.83 uJ per CIFAR-10 inference, substantially surpassing the performance of existing solutions. The Medusa accelerator's ability to perform full workload execution on edge devices not only confirms its excellence for embedded tinyML tasks but also its preparedness for real-world implementations.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2023; ©2023
Publication date 2023; 2023
Issuance monographic
Language English

Creators/Contributors

Author Giordano, Massimo
Degree supervisor Mitra, Subhasish
Thesis advisor Mitra, Subhasish
Thesis advisor Murmann,Boris
Thesis advisor Raina, Priyanka
Degree committee member Murmann,Boris
Degree committee member Raina, Priyanka
Associated with Stanford University, School of Engineering
Associated with Stanford University, Department of Electrical Engineering

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Massimo Giordano.
Note Submitted to the Department of Electrical Engineering.
Thesis Thesis Ph.D. Stanford University 2023.
Location https://purl.stanford.edu/pn872jn5546

Access conditions

Copyright
© 2023 by Massimo Giordano
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...