Scaling a reconfigurable dataflow accelerator

Zhang, Yaqi

Scaling a reconfigurable dataflow accelerator

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fbq716kj6683" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: With the slowdown of Moore's Law, specialized hardware accelerators are gaining traction for delivering 100-1000x performance improvement over general-purpose processors in a variety of applications domains, such as cloud computing, biocomputing, artificial intelligence, etc. As the performance scaling in multicores is coming to a limit, a new class of accelerators---reconfigurable dataflow architectures (RDAs)---offers high-throughput and energy-efficient acceleration that keeps up with the performance demand. Instead of dynamically fetching instructions as traditional processors do, RDAs have flexible data paths that can be statically configured to spatially parallelize and pipeline programs across distributed on-chip resources. The pipelined execution model and explicitly-managed scratchpad in RDAs eliminate the performance, area, and energy overhead from dynamic scheduling and conventional memory hierarchy. To adapt to the compute intensity in modern data-analytic workloads particularly in the deep learning domain, RDAs have increased to an unprecedented scale. With an area footprint of 133mm^2 at 28nm, Plasticine is a previously proposed large-scale RDA supplying 12.3 TFLOPs of computing power. Prior work has shown an up to 76x performance/watt benefit from Plasticine over a Stradix V FPGA due to an advantage in clock frequency and resource density. The increase in scale introduces new challenges in network-on-chip design to maintain the throughput and energy efficiency of an RDA. Furthermore, targeting and managing RDAs at this scale require new strategies in mapping, memory management, and flexible control to fully utilize their compute power. In this work, we focus on two aspects of the software-hardware co-design that impact the usability and scalability of the Plasticine accelerator. Although RDAs are flexible to support a wide range of applications, the biggest challenge that hinders the adoption of these accelerators is the required low-level knowledge in microarchitecture design and hardware constraints in order to efficiently map a new application. To address this challenge, we introduce a compiler stack--SARA--that raises the programming abstraction of Plasticine to an imperative-style domain-specific language with nested control flow for general spatial architectures. The abstraction is architecture-agnostic and contains explicit loop constructs that enable cross-kernel optimizations often not exploited on RDAs. SARA efficiently translates imperative control constructs to a streaming dataflow graph that scales performance with distributed on-chip resources. By virtualizing resources, SARA systematically handles hardware constraints, hiding the low-level architecture-specific restrictions from programmers. To address the scalability challenge with increasing chip size, we present a comprehensive study on the network-on-chip design space for RDAs. We found that network performance highly correlates with bandwidth instead of latency for RDAs with a streaming dataflow execution model. Lastly, we show that a static-dynamic hybrid network design can sustain performance in a scalable fashion with high energy efficiency

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2020; ©2020
Publication date	2020; 2020
Issuance	monographic
Language	English

Creators/Contributors

Author	Zhang, Yaqi
Degree supervisor	Olukotun, Oyekunle Ayinde
Thesis advisor	Olukotun, Oyekunle Ayinde
Thesis advisor	Mitra, Subhasish
Thesis advisor	Zaharia, Matei
Degree committee member	Mitra, Subhasish
Degree committee member	Zaharia, Matei
Associated with	Stanford University, Department of Electrical Engineering.

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Yaqi Zhang
Note	Submitted to the Department of Electrical Engineering
Thesis	Thesis Ph.D. Stanford University 2020
Location	electronic resource

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...