Hardware acceleration For graph neural networks

Kiningham, Kevin Nicholas

Hardware acceleration For graph neural networks

<a href="https://embed.stanford.edu/iframe/?url=https%3A%2F%2Fpurl.stanford.edu%2Frd907hk5005" class="su-underline">Show Content</a>

Abstract/Contents

Abstract: Traditional deep neural networks (DNNs) rely on regularly structured inputs such as vectors, images, or sequences. This reliance on regularity makes them difficult to use in domains where data is naturally irregular, such as connections on social media. Graph neural networks (GNNs) extend DNNs to operate on arbitrarily structured graph-valued data. However, GNNs do not accelerate efficiently on CPUs, GPUs, and DNN accelerators such as a TPU, due to their irregular and input-dependent pattern of computation. As a result, GNNs have much higher inference latency than other types of DNNs in practice. This limits their use to applications where inference can be precomputed off-line. This dissertation presents hardware and software techniques to reduce the inference latency of GNNs. To this end, we make the following three contributions. First, we introduce a decomposition of GNN inference into a series of three computational phases: aggregate, combine, and update. This decomposition permits a simple representation for a broad class of GNNs we call GReTA. Second, we introduce a GNN accelerator architecture called GRIP. GRIP alleviates the bottlenecks in each phase by using separate memory subsystems specialized for the different access patterns in each phase of inference. Finally, we introduce optimizations to reduce the working memory and bandwidth requirements for GNN inference including caching partitions of feature data, inter-phase pipelining, and merging computation. We also introduce a novel optimization called vertex-tiling that substantially improves latency by increasing the reuse of weight values during inference. Taken together, these techniques significantly reduce the latency of GNN inference over existing state-of-the-art implementations. Evaluated on a broad range of models and datasets, our accelerator reduces latency by 7-70x compared to a CPU and GPU baseline.

Description

Type of resource	text
Form	electronic resource; remote; computer; online resource
Extent	1 online resource.
Place	California
Place	[Stanford, California]
Publisher	[Stanford University]
Copyright date	2021; ©2021
Publication date	2021; 2021
Issuance	monographic
Language	English

Creators/Contributors

Author	Kiningham, Kevin Nicholas
Degree supervisor	Levis, Philip
Thesis advisor	Levis, Philip
Thesis advisor	Horowitz, Mark (Mark Alan)
Thesis advisor	Ré, Christopher
Degree committee member	Horowitz, Mark (Mark Alan)
Degree committee member	Ré, Christopher
Associated with	Stanford University, Department of Electrical Engineering

Subjects

Genre	Theses
Genre	Text

Bibliographic information

Statement of responsibility	Kevin Kiningham.
Note	Submitted to the Department of Electrical Engineering.
Thesis	Thesis Ph.D. Stanford University 2021.
Location	https://purl.stanford.edu/rd907hk5005

Access conditions

License: This work is licensed under a Creative Commons Attribution 3.0 Unported license (CC BY).

Also listed in

View in SearchWorks

Loading usage metrics...