Compiling applications to reconfigurable push-memory accelerators

Liu, Qiaoyi

Compiling applications to reconfigurable push-memory accelerators

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fxt429yq8821" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: The slowing down of Moore's law and evolving of applications has underscored the increasing significance of domain specific architectures. Programmable domain-specific accelerators, such as coarse-grained reconfigurable arrays (CGRAs), have emerged as a promising middle-ground between efficiency and flexibility, but they have traditionally been difficult compiler targets since they use a different memory system. In contrast to general purpose compute platform, the memory hierarchies of domain-specific accelerators use push memories, which send data streams to computation kernels or to other levels in the memory hierarchy. To address the compilation challenge caused by push memories, in this thesis, we have introduced a novel abstraction, namely, the unified buffer, designed to support the application compilation process to reconfigurable accelerators as well as facilitating physical hardware design for reconfigurable architecture. The unified buffer abstraction enables the compiler to separate generic application scheduling optimizations from the mapping to specific memory implementations in the backend. This approach automates push memory scheduling optimization through a collection of compiler techniques, including polyhedral analysis and software pipelining, effectively shielding users from the low-level hardware details. This separation also allows our compiler to bridge the gap between resource-agnostic application description and resource-constrained hardware implementation, mapping applications to different CGRA memory designs, including some with a ready-valid interface. Furthermore, the separation also opens the opportunity for optimizing push memory elements on reconfigurable arrays. Our optimized memory implementation, the Physical Unified Buffer (PUB), uses a wide-fetch, single-port SRAM macro is 18% smaller and consumes 31% less energy than a physical buffer implementation using a dual-port memory. Finally, our system evaluation shows that enabling a compiler to support CGRAs leads to performance and energy benefits. Over a wide range of image processing and machine learning applications, our CGRA achieves 4.7x better runtime and 3.5x better energy-efficiency compared to an FPGA.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2023; ©2023
Publication date	2023; 2023
Issuance	monographic
Language	English

Creators/Contributors

Author	Liu, Qiaoyi
Degree supervisor	Horowitz, Mark (Mark Alan)
Thesis advisor	Horowitz, Mark (Mark Alan)
Thesis advisor	Kjolstad, Fredrik
Thesis advisor	Raina, Priyanka, (Assistant Professor of Electrical Engineering)
Degree committee member	Kjolstad, Fredrik
Degree committee member	Raina, Priyanka, (Assistant Professor of Electrical Engineering)
Associated with	Stanford University, School of Engineering
Associated with	Stanford University, Department of Electrical Engineering

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Qiaoyi Liu.
Note	Submitted to the Department of Electrical Engineering.
Thesis	Thesis Ph.D. Stanford University 2023.
Location	https://purl.stanford.edu/xt429yq8821

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...