The design and implementation of large-scale graph analysis language
- With the rise of worldwide social networking services, large-scale graph analysis has become important. Because of a lack of high-level programming models, MapReduce or Pregel are used for large-scale graph analysis, but require much effort to implement even simple analyses. For greater ease of use and efficiency, we propose SociaLite, a high-level graph query language based on Datalog. As a logic programming language, Datalog allows many graph algorithms to be expressed succinctly. However, its performance has not been competitive when compared to imperative programming languages. With SociaLite, users can provide simple annotations to the data layout and evaluation order; they can also define recursive aggregate functions, which can be evaluated incrementally and efficiently. Moreover, SociaLite is extended to provide high-level abstractions for distributed computation. These extensions allow users to simply annotate how data is to be distributed; then SociaLite compiler automatically generates parallel code for a cluster of multi-core machines. The evaluation of recursive aggregate functions is optimized with a delta-stepping technique on a distributed cluster. In addition, approximate computation is supported in SociaLite, allowing users to trade off accuracy for less execution time and storage space. We evaluated SociaLite with core graph algorithms, including shortest paths and PageRank, that are commonly used in graph analyses. With its optimizations to Datalog, SociaLite is shown to be much faster than Datalog, and as fast as highly optimized Java. When evaluated on a multi-core machine, SociaLite programs scaled linearly up to 16 cores, except for the shortest paths program which showed a speedup of 10 on 16 cores. In our experiment with 64 Amazon EC2 instances, SociaLite programs tracked the ideal weak scaling curve within a factor of two. Compared to Giraph, an open-source version of Pregel, SociaLite programs are 4 to 12 times faster across benchmark algorithms and 22 times more succinct on average. We also extensively evaluated state-of-the-art graph systems, GraphLab, Combinatorial BLAS, as well as Giraph, and compared their programmability and performance with SociaLite. From the evaluation, we found that SociaLite provides the simplest programming model; for BFS (Breadth First Search) algorithm, the SociaLite program is 10 times more succinct than the equivalent programs in other systems. SociaLite also demonstrated competitive performance; among the compared systems, SociaLite is second fastest after Combinatorial BLAS. Most importantly, as a high-level query language, SociaLite makes it easy for users with little programming background to write many sophisticated graph applications, that are automatically optimized to run on a cluster of machines efficiently.
|Type of resource
|electronic; electronic resource; remote
|1 online resource.
|Seo, Ji Won
|Stanford University, Department of Electrical Engineering.
|Lam, Monica S
|Lam, Monica S
|Statement of responsibility
|Submitted to the Department of Electrical Engineering.
|Thesis (Ph.D.)--Stanford University, 2015.
- © 2015 by Ji Won Seo
- This work is licensed under a Creative Commons Attribution Non Commercial 3.0 Unported license (CC BY-NC).
Also listed in
Loading usage metrics...