Bringing Modern Compiler Technology and Programming Languages to Data Processing Engines

The Walnut project leverages GraalVM in data processing engines, focusing on the rapid embedding of new programming languages (JavaScript, Python, Java, Ruby, R and others) and the use of runtime code generation and speculative optimization in query processing.

Oracle Database Multilingual Engine

The Oracle Database Multilingual Engine (MLE) enables developers to work efficiently with DB-resident data in modern programming languages and development environments of their choice. As such, MLE is an embedding of GraalVM into the Oracle Database, focusing on tight integration of (PL/)SQL and guest languages. MLE uses GraalVM's language-agnostic Truffle interface and speculative optimizations for efficient conversions between database and guest language data types, and leverages the GraalVM Native Image feature for embedding.

The Oracle Database MLE has been published as an experimental feature of the Oracle Database 12c. It features JavaScript stored procedures, user-defined functions and dynamic code snippet execution as well as some experimental features for Python:

SQL Expression Compilation

The efficient execution of SQL hinges on the efficient execution of SQL expressions such as arithmetic operations or operations on strings. This is particularly true in main-memory databases where disk IO is not the bottleneck. The classic approach for evaluating SQL expressions follows a recursive descent interpretation strategy involving many virtual function calls.

Within the scope of the Walnut project, we are investigating how the GraalVM just-in-time compiler can be used for compiling expression trees at runtime. Specifically, we study the performance gains that can be achieved by leveraging Graal’s speculative optimization approach in addition to partial evaluation.

Data Processing Active Libraries

Another focus of the Walnut project is to design and implement a set of GraalVM Truffle interpreters (called active libraries) for the most common tasks in data processing. These libraries are self-optimizing, performing aggressive speculation based on the data they work on. Leveraging Truffle and Polyglot, each library needs to be implemented only once, and is available to any other Truffle interpreter. Examples include data serialization, conversion (e.g. from OracleNumber to IEEE 754 and back), data connectors, compression and communication.

Hardware and Software, Engineered to Work Together