A Worst-Case Optimal Framework for Arbitrary Join Queries


A Worst-Case Optimal Framework for Arbitrary Join Queries

Principal Investigator

Kunle Olukotun

Stanford University, Department of Electrical Engineering

Oracle Fellowship Recipient

Chris Aberger

Oracle Principal Investigator

Hassan Chafi, Vice President, Research and Advanced Development
Sungpack Hong, Director, Research And Advanced Development


Recent work on the relational join has introduced algorithms that are asymptotically faster for a large class of queries than the pairwise-join-based query plans which are prevalent in commercial database systems. We are continuing research on a query compiler, DunceCap, which takes advantage of these theoretical advances to produce optimal query plans based on variations of two algorithms that are asymptotically better than traditional pairwise algorithms. The first is Yannakakis’ classical algorithm for acyclic queries. The second is a more recent algorithm which can be applied to any query and has a linear runtime with respect to the worst-case size of the output. We study the tradeoff space between these two algorithms and their variations, and find that the query plans compiled by our system can outperform standard RDBMS algorithms as well as simple worst-case optimal algorithms by an order of magnitude on a variety of queries.

In addition, we are are continuing research on EmptyHeaded, which is the backend storage manager and execution engine of DunceCap, our query compiler. EmptyHeaded uses recent algorithmic advances in join processing to compile patterns into Boolean algebra operations that exploit SIMD parallelism. The EmptyHeaded engine demonstrates that treating graph patterns as a general join processing problem can compete with and often outperform both specialized approaches and existing OLAP systems on graph queries. The core Boolean algebra operation performed in EmptyHeaded is set intersection. Extracting SIMD parallelism during set intersections on graph data is challenging because graph data can be skewed in several different ways. Our contributions are a demonstration of this new type of engine, with Boolean algebra at its core, an exploration of set intersection representations, and algorithms for set intersections that are optimized for skew. We demonstrate that EmptyHeaded outperforms specialized graph engines by over an order of magnitude and relational systems by over two orders of magnitude. Our results suggest that this new style of engine is a promising new direction for future graph engines and accelerators.