Compiling applications to reconfigurable push-memory accelerators
- The slowing down of Moore's law and evolving of applications has underscored the increasing significance of domain specific architectures. Programmable domain-specific accelerators, such as coarse-grained reconfigurable arrays (CGRAs), have emerged as a promising middle-ground between efficiency and flexibility, but they have traditionally been difficult compiler targets since they use a different memory system. In contrast to general purpose compute platform, the memory hierarchies of domain-specific accelerators use push memories, which send data streams to computation kernels or to other levels in the memory hierarchy. To address the compilation challenge caused by push memories, in this thesis, we have introduced a novel abstraction, namely, the unified buffer, designed to support the application compilation process to reconfigurable accelerators as well as facilitating physical hardware design for reconfigurable architecture. The unified buffer abstraction enables the compiler to separate generic application scheduling optimizations from the mapping to specific memory implementations in the backend. This approach automates push memory scheduling optimization through a collection of compiler techniques, including polyhedral analysis and software pipelining, effectively shielding users from the low-level hardware details. This separation also allows our compiler to bridge the gap between resource-agnostic application description and resource-constrained hardware implementation, mapping applications to different CGRA memory designs, including some with a ready-valid interface. Furthermore, the separation also opens the opportunity for optimizing push memory elements on reconfigurable arrays. Our optimized memory implementation, the Physical Unified Buffer (PUB), uses a wide-fetch, single-port SRAM macro is 18% smaller and consumes 31% less energy than a physical buffer implementation using a dual-port memory. Finally, our system evaluation shows that enabling a compiler to support CGRAs leads to performance and energy benefits. Over a wide range of image processing and machine learning applications, our CGRA achieves 4.7x better runtime and 3.5x better energy-efficiency compared to an FPGA.
|Type of resource
|electronic resource; remote; computer; online resource
|1 online resource.
|Degree committee member
|Degree committee member
|Stanford University, School of Engineering
|Stanford University, Department of Electrical Engineering
|Statement of responsibility
|Submitted to the Department of Electrical Engineering.
|Thesis Ph.D. Stanford University 2023.
- © 2023 by Qiaoyi Liu
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...