GRAPHALYTICS Bench -- A Graph Analytics Benchmark
Oracle Principal Investigator
Hassan Chafi, Vice President, Research and Advanced Development
Jay Banerjee, Sr. Director, SW Development
Sungpack Hong, Senior Research Director
Among the emerging Big Data processes, processing graphs, especially at large scale, is an increasingly important activity at Oracle and in a variety of business, engineering, and scientific domains. Tens of very different graph-processing platforms, such as Giraph, GraphLab, and even the generic Hadoop, already exist. For graph-processing to continue to evolve, users have to find it easy to select the right graph-processing platform, and developers and system integrators have to find it easy to quantify the non-functional aspects of the system, from performance to scalability.
The GRAPHALYTICS project, which unites different research and industrial institutions, namely the Delft University of Technology, Universitat Politècnica de Catalunya, CWI, and Oracle, proposes to research and develop a comprehensive benchmarking suite for graph-processing platforms, and integrate these in the efforts of the Linked Data Benchmark Council (LDBC). In Year 1, the project resulted in an important conceptual advance in the field progressing in two directions. First, the definition of a comprehensive benchmarking process, which includes the design and selection of benchmarking metrics, the selection of graph-processing algorithms and datasets, and the design of a reporting procedure--all of which raise unique and non-trivial challenges related to graph processing platforms. Second, the evolution of the DATAGEN LDBC Social Network generator that includes analytics tools for fitting the distributions of links and structural characteristics of the generated graphs with real Social graphs taken from different environments. Soon, Graphalytics will be available on GitHub.