Walnut

OVERVIEW

  • The Walnut project aims at improving data processing along three axes:

    1. define a framework for rapidly and efficiently integrating new programming languages into existing data processing engines (e.g. JavaScript, Python or R);
    2. apply the runtime compilation strategies that modern programming languages have enjoyed for over a decade to data processing engines and query execution in particular;
    3. use runtime compilation technologies to rapidly develop new extensions to data processing engines and to enable efficient interoperability between engines specific to different data models (e.g., graph and relational).
    4. The Walnut project is currently pursuing several efforts to achieve these goals.

      Multilingual Execution Framework

      Many programming languages are used in today’s business applications. Browser-side web applications are primarily developed in JavaScript. Business logic in the middle tier is mainly implemented in languages like Java, Scala, C/C++, PHP, Ruby, Python, and increasingly, JavaScript. R and Python are typically used for machine learning and data sciences. Data processing is still dominated by SQL. However, recently other languages are being used in data processing platforms (e.g., Scala, Python or R in Apache Spark) as well as database systems.

      Increasingly, programming languages used to develop application logic are being pushed down as data processing logic is becoming increasingly sophisticated and specialized. Achieving a secure and high-performance implementation of a new programming language is a daunting engineering-intensive task. Integrating that implementation into data processing engines adds further complexity and challenges. Arguably, repeating this for new trendy languages is not sustainable. The complexity can be dramatically reduced if these programming languages are all implemented on a common foundation.

      Oracle Labs’ Graal project is providing such a foundation, in the form of an ecosystem for developing inter-operable high-performance programming language implementations. The common foundation consists of a state-of-the-art feedback-driven speculative runtime compiler (Graal), a framework for developing self-modifying interpreters that exploit these ability for speculative optimization (Truffle), and an embeddable runtime to execute languages implemented with that common foundation (Substrate VM).

      The Walnut project extends this common foundation to ease its integration into data processing engines, focusing on enabling rapid embedding of new programming languages. In a first version, this data-centric multilingual execution framework aims at executing stored procedures and user-defined extensions (e.g., scalar, table, and aggregation functions) that run in the data processing engine’s address space, tightly coupled with its query execution engine.

      Being the most popular language today (2017), JavaScript is the first language we are targeting with this framework. The walnut project looks first and foremost into the integration with the Oracle Database, but also with other data processing platform in Oracle’s product portfolio, in particular, MySQL and Exadata.

      The multi-lingual framework offers additional opportunities for a deeper and more efficient integration of languages into the database. For example, it enables for the efficient implementation of data conversions (e.g., from Oracle Number to IEEE 754 double, or 32-bit integers) using Graal’s speculative optimization capabilities. Also, it enables exporting database data types (e.g., Oracle Number) and their operations to the embedded programming language. Ultimately, the framework could go beyond UDFs by allowing database developers to more productively write database operators in higher-level languages than C.

      SQL Expression Compilation

      The efficient execution of SQL hinges on the efficient execution of SQL expressions such as arithmetic operations or operations on Strings. This is particularly true in main-memory database where disk IO is not the bottleneck. The classic approach for evaluating SQL expressions follows a recursive descent interpretation strategy involving many virtual function calls.

      Within the scope of the Walnut project, we are investigating how the Graal just-in-time compiler can be used for compiling expression trees at runtime. Specifically, we study the performance gains that can be achieved by leveraging Graal’s speculative optimization approach in addition to partial evaluation.