Accelerator architectures for matrix applications

Kim, Sang Kyun; Stanford University, Department of Electrical Engineering

Accelerator architectures for matrix applications

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Fnn963tk4553" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Matrices are a well known data representation extensively used in a wide range of applications. Numerous applications of various domains use matrix operations to represent and perform their core algorithms. Thus, improving matrix operation performance is critical to a vast variety of fields as it not only allows existing applications to run faster, but also enables computations with larger matrices. Modern GPUs and CPUs with SIMD support have been very effective at accelerating matrix operations. However, these current architectures only work well on dense and fat matrices. Skinny dense matrices tend to underutilize SIMD resources when the width of a matrix is less than the number of SIMD lanes, and may limit the scalability since they have a smaller amount of computation to hide the communication overhead. Sparse matrices are also difficult to accelerate current architectures, because the memory accesses are irregular and the workload imbalance is severe. This thesis introduces two different specialized hardware, targeting narrow dense and sparse matrices. The first part of this thesis focuses on accelerating a Restricted Boltzmann Machine (RBM), a popular machine learning algorithm used in deep learning. The RBM accelerator was designed using a modular approach to achieve linear scalability across transistor technologies, as well as across chip boundaries. The accelerator was implemented on FPGAs to demonstrate the performance improvements over high-end CPUs and GPUs. Both fat and skinny matrices were shown to fully utilize the computation resources in the learning process, which allows the training algorithm to converges in less number of iterations. The second part of this thesis describes how sparse matrix applications can be accelerated with domain-specific hardware. We studied three sparse matrix applications that conventional hardware cannot easily accelerate. Based on our findings, we devised an accelerator architecture which targets certain sparse and dense matrix operations. The accelerator is capable of exploiting the fine-grained parallelism within sparse matrices despite the irregularity through buffering and work-stealing. In order to cover a wider range of applications, a small general-purpose core was added to the accelerator for non-critical execution flows. The sparse matrix accelerator was implemented on an FPGA board as an ASIC prototype to evaluate the performance using real-world data. Our accelerator shows performance comparable to GPUs on dense matrix operations, and excels over conventional hardware on sparse matrix operations.

Description

Type of resource	text
Form	electronic; electronic resource; remote
Extent	1 online resource.
Publication date	2013
Issuance	monographic
Language	English

Creators/Contributors

Associated with	Kim, Sang Kyun
Associated with	Stanford University, Department of Electrical Engineering
Primary advisor	Olukotun, Oyekunle Ayinde
Thesis advisor	Olukotun, Oyekunle Ayinde
Thesis advisor	Kozyrakis, Christoforos, 1974-
Thesis advisor	Ng, Andrew Y, 1976-
Advisor	Kozyrakis, Christoforos, 1974-
Advisor	Ng, Andrew Y, 1976-

Subjects

Genre	Theses

Bibliographic information

Statement of responsibility	Sang Kyun Kim.
Note	Submitted to the Department of Electrical Engineering.
Thesis	Thesis (Ph.D.)--Stanford University, 2013.
Location	electronic resource

Access conditions

License: This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

View in SearchWorks

Loading usage metrics...