Compiling applications to reconfigurable push-memory accelerators

Placeholder Show Content

Abstract/Contents

Abstract
The slowing down of Moore's law and evolving of applications has underscored the increasing significance of domain specific architectures. Programmable domain-specific accelerators, such as coarse-grained reconfigurable arrays (CGRAs), have emerged as a promising middle-ground between efficiency and flexibility, but they have traditionally been difficult compiler targets since they use a different memory system. In contrast to general purpose compute platform, the memory hierarchies of domain-specific accelerators use push memories, which send data streams to computation kernels or to other levels in the memory hierarchy. To address the compilation challenge caused by push memories, in this thesis, we have introduced a novel abstraction, namely, the unified buffer, designed to support the application compilation process to reconfigurable accelerators as well as facilitating physical hardware design for reconfigurable architecture. The unified buffer abstraction enables the compiler to separate generic application scheduling optimizations from the mapping to specific memory implementations in the backend. This approach automates push memory scheduling optimization through a collection of compiler techniques, including polyhedral analysis and software pipelining, effectively shielding users from the low-level hardware details. This separation also allows our compiler to bridge the gap between resource-agnostic application description and resource-constrained hardware implementation, mapping applications to different CGRA memory designs, including some with a ready-valid interface. Furthermore, the separation also opens the opportunity for optimizing push memory elements on reconfigurable arrays. Our optimized memory implementation, the Physical Unified Buffer (PUB), uses a wide-fetch, single-port SRAM macro is 18% smaller and consumes 31% less energy than a physical buffer implementation using a dual-port memory. Finally, our system evaluation shows that enabling a compiler to support CGRAs leads to performance and energy benefits. Over a wide range of image processing and machine learning applications, our CGRA achieves 4.7x better runtime and 3.5x better energy-efficiency compared to an FPGA.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2023; ©2023
Publication date 2023; 2023
Issuance monographic
Language English

Creators/Contributors

Author Liu, Qiaoyi
Degree supervisor Horowitz, Mark (Mark Alan)
Thesis advisor Horowitz, Mark (Mark Alan)
Thesis advisor Kjolstad, Fredrik
Thesis advisor Raina, Priyanka, (Assistant Professor of Electrical Engineering)
Degree committee member Kjolstad, Fredrik
Degree committee member Raina, Priyanka, (Assistant Professor of Electrical Engineering)
Associated with Stanford University, School of Engineering
Associated with Stanford University, Department of Electrical Engineering

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Qiaoyi Liu.
Note Submitted to the Department of Electrical Engineering.
Thesis Thesis Ph.D. Stanford University 2023.
Location https://purl.stanford.edu/xt429yq8821

Access conditions

Copyright
© 2023 by Qiaoyi Liu
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...