Efficient analysis using Soufflé - An experience report
July 2016

Souffle is an open-source programming framework for static program analysis. It enables the analysis designer to express static program analysis on very large code bases such as a points-to analysis for the Java Development Kit (JDK) which has more than 1.5 million variables and 600 thousand call sites. Souffle employs a Datalog-like language as a domain specific language for static program analysis. Its finite domain semantics lends to efficient execution on parallel hardware using various levels of program specialisations. A specialization hierarchy is applied to a Datalog program. As a result, highly specialized and optimised C++ code is produced that harvests the computational power of modern shared-memory/multi-core computer architectures. We have been using Souffle to explore and develop vulnerability detection analyses on the Java platform, using JDK 7, 8 and 9. These vulnerability detection analyses make use of points-to analysis (reusing parts of the DOOP framework), taint analysis, escape analysis, and other data flow-based analyses. In this talk we report on the types of analyses used, the sizes of the input relations and computed relations, as well as the the runtime and memory requirements for the analyses of such large codebases. For the program specialization, we use several translation steps. In each translation step, new optimisation opportunities open up that would not be able to exploit in the previous translation step. The first translation uses a Futamura projection to translate a declarative Datalog program to an imperative relational program for an abstract machine which we call the Relational Algebra Machine (RAM). The RAM program contains relational algebra operations to compute results produced by clauses, relation management operations to keep track of previous, current and new knowledge in the semi-naive evaluation, and imperative constructs including statement composition for sequencing the operations, and loop construction with loop exit condition to express fixed-points computations for recursively-defined relations. It also has support for parallelism. The next translation steps, translates the optimized RAM program into a C++ program that uses meta-programming techniques with templates. The last translation step, is performed by a C++ program that compiles the C++ program to a executable binary. Operations for emptiness and existence checks, range queries, insertions and unions are highly efficient because portions of the operations are pushed from runtime to compile-time using meta-programming techniques. We now outline some of the novel aspects that are in the implementation of Souffle. The first is related to indices. Since indices are costly, a minimal set of indices for a given relation is desired. We employ a discrete optimization problem to minimize indices creating only the required indices for the execution is required and hence avoiding redundancies. The second is the choice of data-structures to represent large relations...

Authors: Cristina Cifuentes

Venue: DECAF workshop:


Hardware and Software, Engineered to Work Together