Distributed Graph Processing with PGX.D (2022)

Distributed Graph Processing with PGX.D (2022)

Vasileios Trigonakis, Lucas Braun, Calin Iorgulescu

13 December 2022

Graph processing is one of the top data analytics trends. In particular, graph processing comprises two main styles of analysis, namely graph algorithms and graph pattern-matching queries. Classic graph algorithms, such as Pagerank, repeatedly traverse the vertices and edges of the graph and calculate some desired (mathematical) function. Graph queries enable the interactive exploration and pattern matching of graphs. For example, queries like `SELECT p1.name, p2.name FROM MATCH (p1:person)-[:friend]->(p2:person) WHERE p1.country = p2.country` combine the classic operations found in SQL with graph patterns. Both algorithms and queries are very challenging workloads, especially in a distributed setting, where very large graphs are partitioned across multiple machines. In this lecture, I will present how the distributed PGX [1] engine (known as PGX.D;  developed at Oracle Labs [2] Zurich) implements efficient algorithms and queries and solves problems, such as data skew and intermediate-result explosion. In brief, for graph algorithms, PGX.D offers the functionality to compile simple sequential textbook-style GreenMarl [3] algorithms to efficient distributed execution. For queries, PGX.D includes a depth-first asynchronous computation runtime [4] that enables limiting the amount of intermediate data during query execution to essentially support "any-size" patterns. [1] http://www.oracle.com/technetwork/oracle-labs/parallel-graph-analytix/overview/index.html [2] https://labs.oracle.com [3] Green-Marl: A DSL for easy and efficient graph analysis, ASPLOS'12. [4] aDFS: An Almost Depth-First-Search Distributed Graph-Querying System. USENIX ATC'21.


Venue : Big Data MSc. course at ETH Zurich

File Name : 2022-12-14-ETHBigDataGuestLecture_small.pdf