Scaling a reconfigurable dataflow accelerator

Placeholder Show Content


With the slowdown of Moore's Law, specialized hardware accelerators are gaining traction for delivering 100-1000x performance improvement over general-purpose processors in a variety of applications domains, such as cloud computing, biocomputing, artificial intelligence, etc. As the performance scaling in multicores is coming to a limit, a new class of accelerators---reconfigurable dataflow architectures (RDAs)---offers high-throughput and energy-efficient acceleration that keeps up with the performance demand. Instead of dynamically fetching instructions as traditional processors do, RDAs have flexible data paths that can be statically configured to spatially parallelize and pipeline programs across distributed on-chip resources. The pipelined execution model and explicitly-managed scratchpad in RDAs eliminate the performance, area, and energy overhead from dynamic scheduling and conventional memory hierarchy. To adapt to the compute intensity in modern data-analytic workloads particularly in the deep learning domain, RDAs have increased to an unprecedented scale. With an area footprint of 133mm^2 at 28nm, Plasticine is a previously proposed large-scale RDA supplying 12.3 TFLOPs of computing power. Prior work has shown an up to 76x performance/watt benefit from Plasticine over a Stradix V FPGA due to an advantage in clock frequency and resource density. The increase in scale introduces new challenges in network-on-chip design to maintain the throughput and energy efficiency of an RDA. Furthermore, targeting and managing RDAs at this scale require new strategies in mapping, memory management, and flexible control to fully utilize their compute power. In this work, we focus on two aspects of the software-hardware co-design that impact the usability and scalability of the Plasticine accelerator. Although RDAs are flexible to support a wide range of applications, the biggest challenge that hinders the adoption of these accelerators is the required low-level knowledge in microarchitecture design and hardware constraints in order to efficiently map a new application. To address this challenge, we introduce a compiler stack--SARA--that raises the programming abstraction of Plasticine to an imperative-style domain-specific language with nested control flow for general spatial architectures. The abstraction is architecture-agnostic and contains explicit loop constructs that enable cross-kernel optimizations often not exploited on RDAs. SARA efficiently translates imperative control constructs to a streaming dataflow graph that scales performance with distributed on-chip resources. By virtualizing resources, SARA systematically handles hardware constraints, hiding the low-level architecture-specific restrictions from programmers. To address the scalability challenge with increasing chip size, we present a comprehensive study on the network-on-chip design space for RDAs. We found that network performance highly correlates with bandwidth instead of latency for RDAs with a streaming dataflow execution model. Lastly, we show that a static-dynamic hybrid network design can sustain performance in a scalable fashion with high energy efficiency


Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2020; ©2020
Publication date 2020; 2020
Issuance monographic
Language English


Author Zhang, Yaqi
Degree supervisor Olukotun, Oyekunle Ayinde
Thesis advisor Olukotun, Oyekunle Ayinde
Thesis advisor Mitra, Subhasish
Thesis advisor Zaharia, Matei
Degree committee member Mitra, Subhasish
Degree committee member Zaharia, Matei
Associated with Stanford University, Department of Electrical Engineering.


Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Yaqi Zhang
Note Submitted to the Department of Electrical Engineering
Thesis Thesis Ph.D. Stanford University 2020
Location electronic resource

Access conditions

© 2020 by Yaqi Zhang
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...