Shove aside, push : the case for pull-based graph processing

Placeholder Show Content

Abstract/Contents

Abstract
Graph processing engines typically follow either a push pattern or a pull pattern in their implementations. A pull-based engine can achieve a significantly higher processing throughput than that of a push engine because the latter is bottlenecked by its write-heavy workload in which many accesses require synchronization due to conflicts between threads, whereas the former performs a single thread-private write to each vertex. However, a push-based engine is better equipped to exploit the frontier optimization, which leverages the fact that often only a small subset of the graph must be accessed to complete an iteration of an application. Existing work has managed the trade-off between patterns using hybrid engines that dynamically select one per iteration based on the size of the frontier, but this comes with two disadvantages. First, programmers must implement applications twice (once for push and once for pull), and second, iterations for which push is selected execute with reduced throughput. The goal of this thesis is to improve pull engines to the point that push engines are unnecessary. We achieve this goal in two ways. First, we accelerate pull engines by introducing new strategies for parallelization and vectorization. Second, we rebuild the frontier optimization such that a pull engine can exploit it effectively, despite conventional wisdom stating this to be impossible. In accelerating the performance of pull engines, we first consider inner loop parallelization. Graph processing engines conceptually consist of a two-level nested loop, in which the outer loop iterates over vertices and the inner loop over edges. Inner loop parallelization is important for load balance and is uniquely challenging for pull engines: performed naively, it introduces write conflicts between threads updating the same destination vertex. We therefore propose Scheduler Awareness, which allows programmers to exploit the fact that many parallel loop schedulers execute multiple consecutive loop iterations on the same core. When applied to a pull engine, it eliminates most write traffic and all write conflicts, leading to a peak speedup of 50x over a baseline configuration parallelized without scheduler awareness. We next consider inner loop vectorization to benefit from increased memory bandwidth utilization and increased memory operation density in the instruction stream. Unfortunately, the Compressed-Sparse data structures commonly used in existing work present an impediment to vectorization: tight packing of edges leads to many unaligned memory accesses and per-vertex boundary checks. We overcome both with Vector-Sparse, a low-level modification to the encoding of edges in Compressed-Sparse that leads to a peak speedup of 2.5x over a non-vectorized implementation. Both strategies for accelerating pull engines are embodied in Grazelle, the first key contribution of this thesis. Grazelle is a hybrid graph processing framework that respectively outperforms existing frameworks Ligra, Polymer, GraphMat, and X-Stream by up to 15.2x, 4.6x, 4.7x, and 66.8x. We next turn our attention to the frontier optimization. The frontier tracks source vertices, so a push engine can simply iterate over all vertices in the frontier, whereas a pull engine must scan unconditionally through the entire graph. To implement the frontier optimization in a pull-friendly manner, in between iterations we transform the traditional frontier into the Wedge Frontier, a representation of the same information that can be efficiently traversed by a pull engine. This process can be costly, so we present two optimizations. First, we produce the Wedge Frontier only when it is sufficiently empty. Second, we coarsen the granularity of the representation of elements in the Wedge Frontier. Both are effective, respectively leading to speedups of up to 5x and 2x over unoptimized configurations. The second key contribution of this thesis, Wedge, is a pull-only graph processing framework that implements our proposed pull-friendly frontier optimization. Wedge capitalizes on Grazelle's high-throughput pull engine and respectively outperforms Grazelle, Ligra, and GraphMat by up to 2.8x, 4.9x, and 185.5x. Taken together, Grazelle and Wedge improve the performance and utility of pull engines to the point that push engines are unnecessary.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2018; ©2018
Publication date 2018; 2018
Issuance monographic
Language English

Creators/Contributors

Author Grossman, Samuel Robert
Degree supervisor Kozyrakis, Christoforos, 1974-
Thesis advisor Kozyrakis, Christoforos, 1974-
Thesis advisor Horowitz, Mark (Mark Alan)
Thesis advisor Olukotun, Oyekunle Ayinde
Degree committee member Horowitz, Mark (Mark Alan)
Degree committee member Olukotun, Oyekunle Ayinde
Associated with Stanford University, Department of Electrical Engineering.

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Samuel Robert Grossman.
Note Submitted to the Department of Electrical Engineering.
Thesis Thesis Ph.D. Stanford University 2018.
Location electronic resource

Access conditions

Copyright
© 2018 by Samuel Robert Grossman
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...