Characterizing and improving graph algorithm performance on multicore systems

Placeholder Show Content

Abstract/Contents

Abstract
The rise of big data analytics has contributed to the growing popularity and scale of graph datasets, positioning graph analysis as an important research area. Graph analysis is an essential tool in many domains, including the physical and social sciences, healthcare, business intelligence, and cybersecurity. The increasing scale of graph analysis problems, with graphs containing millions or billions of vertices and edges, has made parallel and distributed graph algorithms essential for effective analysis of these large datasets. At the same time, modern multicore systems have been scaling to higher core counts, with dozens of complex cores in a single system. At first glance, it would seem that graph algorithms can leverage data-level parallelism across graph vertices and edges to utilize this large number of cores to quickly process large datasets. In fact, on multicore systems, graph algorithms are typically inefficient and perform poorly. The real-world informatics graphs used for today's big data analytics are derived from online social networks, web page links, genomics data, and the like. These networks possess fundamental properties that differ from traditional graphs like trees or meshes, resulting in different execution characteristics. We study the factors behind this lack of performance and demonstrate software and hardware techniques that improve performance. First, we analyze the perfor- mance characteristics of a core set of graph analysis algorithms across several infor- matics, physical, and synthetic graph datasets using a multicore microarchitectural simulator. Our characterization indicates that poor performance is due to several fac- tors, including irregular data access patterns, load imbalance, high communication- to-computation ratio, and ineffective caching techniques. To investigate the potential for caching to improve graph algorithm performance, we study the algorithms' data locality. Cache miss rates are an unreliable metric for data locality because they are heavily influenced by dataset size, cache size, and replacement policy. Thus, we use cache-independent locality analysis techniques, including reuse distance and a probability-based locality score, to analyze data locality in graph algorithms. Based on our analysis of data locality, we find that LRU-based cache replacement policies do not provide good performance for the data access patterns characteristic of graph algorithms. Further, we show that data access patterns correlate with algorithm characteristics, graph dataset structure, and vertex degree. These insights indicate that utilization of algorithm- and dataset-specific locality information paired with an improved cache replacement policy could significantly improve graph algorithm performance. Second, we employ our knowledge of real-world graph properties to redesign the algorithm for detecting strongly connected components (SCCs) in a directed graph, a fundamental graph analysis algorithm used in many scientific and engineering do- mains. Traditional approaches in parallel SCC detection show limited performance and poor scaling behavior when applied to large real-world graph instances. We investigate the shortcomings of the conventional approach and propose a series of ex- tensions that account for the fundamental properties of real-world graphs, particularly the small-world property. Our scalable implementation offers excellent performance on diverse small-world graphs resulting in a factor of 5 to 29 times parallel speedup over an optimal sequential algorithm on 16 cores and 32 hardware threads. Third, we propose a new cache replacement policy based on our observations of data locality in graph algorithms. The Graph Priority Insertion Policy (GPIP) uses per-data-structure software priority hints to improve last-level cache hit rates by maintaining data with higher locality in the cache. This policy provides an average reduction in misses per thousand instruction (MPKI) of 3% over least-recently used (LRU) replacement. Overall, our contributions serve to expand understanding of the characteristics of graph algorithms and improve graph algorithm performance through both software and hardware means.

Description

Type of resource text
Form electronic resource; remote; computer; online resource
Extent 1 online resource.
Place California
Place [Stanford, California]
Publisher [Stanford University]
Copyright date 2019; ©2019
Publication date 2019; 2019
Issuance monographic
Language English

Creators/Contributors

Author Rodia, Nicole Celeste
Degree supervisor Olukotun, Oyekunle Ayinde
Thesis advisor Olukotun, Oyekunle Ayinde
Thesis advisor Kozyrakis, Christoforos, 1974-
Thesis advisor Rosenblum, Mendel
Degree committee member Kozyrakis, Christoforos, 1974-
Degree committee member Rosenblum, Mendel
Associated with Stanford University, Department of Electrical Engineering.

Subjects

Genre Theses
Genre Text

Bibliographic information

Statement of responsibility Nicole Celeste Rodia.
Note Submitted to the Department of Electrical Engineering.
Thesis Thesis Ph.D. Stanford University 2019.
Location electronic resource

Access conditions

Copyright
© 2019 by Nicole Celeste Rodia
License
This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).

Also listed in

Loading usage metrics...