News

21-JUN-2017

Trace Register Allocation Policies: Compile-time vs. Performance Trade-offs

Register allocation has to be done by every compiler that targets a register machine, regardless of whether it aims for fast compilation or optimal code quality. State-of-the-art dynamic compilers often use global register allocation approaches such as linear scan. Recent results suggest that non-global trace-based register allocation approaches can compete with global approaches in terms of allocation quality. Instead of processing the whole compilation unit at once, a trace-based register allocator divides the problem into linear code segments, called traces. In this work, we present a register allocation framework that can exploit the additional flexibility of traces to select different allocation strategies based on the characteristics of a trace. This allows fine-grained control over the compile time vs. peak performance trade-off. Our framework features three allocation strategies, a linear-scan-based approach that achieves good code quality, a single-pass bottom-up strategy that aims for short allocation times, and an allocator for trivial traces. We present 6 allocation policies to decide which strategy to use for a given trace. The evaluation shows that this approach can reduce allocation time by 3-43% at a peak performance penalty of about 0-9% on average. For systems that do not mainly focus on peak performance, our approach allows adjusting the time spent for register allocation, and therefore the overall compilation timer, finding the optimal balance between compile time and peak performance according to an application’s requirements.

Read

18-JUN-2017

Practical partial evaluation for high-performance dynamic language runtimes

Most high-performance dynamic language virtual machines duplicate language semantics in the interpreter, compiler, and runtime system. This violates the principle to not repeat yourself. In contrast, we define languages solely by writing an interpreter. The interpreter performs specializations, e.g., augments the interpreted program with type information and profiling information. Compiled code is derived automatically using partial evaluation while incorporating these specializations. This makes partial evaluation practical in the context of dynamic languages: It reduces the size of the compiled code while still compiling all parts of an operation that are relevant for a particular program. When a speculation fails, execution transfers back to the interpreter, the program re-specializes in the interpreter, and later partial evaluation again transforms the new state of the interpreter to compiled code. We evaluate our approach by comparing our implementations of JavaScript, Ruby, and R with best-in-class specialized production implementations. Our general-purpose compilation system is competitive with production systems even when they have been heavily optimized for the one language they support. For our set of benchmarks, our speedup relative to the V8 JavaScript VM is 0.83x, relative to JRuby is 3.8x, and relative to GNU R is 5x.

Read

18-JUN-2017

SOAP 2017 Presentation - An Efficient Tunable Selective Points-to Analysis for Large Codebases

Points-to analysis is a fundamental static program analysis technique for tools including compilers and bug-checkers. Although object-based context sensitivity is known to improve precision of points-to analysis, scaling it for large Java codebases remains an challenge. In this work, we develop a tunable, client-independent, object-sensitive points-to analysis framework where heap cloning is applied selectively. This approach is aimed at large codebases where standard analysis is typically expensive. Our design includes a pre-analysis that determines program points that contribute to the cost of an object-sensitive points-to analysis. A subsequent analysis then determines the context depth for each allocation site. While our framework can run standalone, it is also possible to tune it – the user of the framework can use the knowledge of the codebase being analysed to influence the selection of expensive program points as well as the process to differentiate the required context-depth. Overall, the approach determines where the cloning is beneficial and where the cloning is unlikely to be beneficial. We have implemented our approach using Souffl ́e (a Datalog compiler) and an extension of the DOOP framework. Our experiments on large programs, including OpenJDK, show that our technique is efficient and precise. For the OpenJDK, our analysis reduces 27% of runtime and 18% of memory usage for a negligible loss of precision, while for Jython from the DaCapo benchmark suite, the same analysis reduces 91% of runtime for no loss of precision.

View

16-JUN-2017

Lenient Execution of C on a JVM -- How I Learned to Stop Worrying and Execute the Code

Most C programs do not strictly conform to the C standard, and often show undefined behavior, e.g., on signed integer overflow. When compiled by non-optimizing compilers, such programs often behave as the programmer intended. However, optimizing compilers may exploit undefined semantics for more aggressive optimizations, thus possibly breaking the code. Analysis tools can help to find and fix such issues. Alternatively, one could define a C dialect in which clear semantics are defined for frequent program patterns whose behavior would otherwise be undefined. In this paper, we present such a dialect, called Lenient C, that specifies semantics for behavior that the standard left open for interpretation. Specifying additional semantics enables programmers to safely rely on otherwise undefined patterns. Lenient C aims towards being executed on a managed runtime such as the JVM. We demonstrate how we implemented the dialect in Safe Sulong, a C interpreter with a dynamic compiler that runs on the JVM.

Read

06-JUN-2017

Towards Understandable Smart Contracts

Blockchains and smart contracts can facilitate trustworthy business processes between participants who do not trust each other. However, understanding smart contracts typically requires programming skills. We outline research towards enabling smart contracts that are understandable by humans too.

Read

24-MAY-2017

Inference of Security-Sensitive Entities in Libraries

Programming languages such as Java and CRL execute code with different levels of trust in the same process, and rely on an access control model with fine-grained permissions to protect program code. Permissions are checked programmatically, and rely on programmer discipline. This can lead to subtle errors. To enable automatic security analysis about unauthorised access or information flow, it is necessary to reason about security-sensitive entities in libraries that must be protected by appropriate sanitization/declassification via permission checks. Unfortunately, security-sensitive entities are not clearly identified. In this paper, we investigate security-sensitive entities used in Java-like languages, and develop a static program analysis technique to identify them in large codebases by analysing the patterns of permission checks. Although the technique is generic, our focus is on Java where checkPermission calls are used to guard potential security-sensitive entities. Our inference analysis uses two parameters called proximity and coverage to reduce false positive/negative reports. The usefulness of the analysis is illustrated by the results obtained while checking the OpenJDK7-b147 for conformance to Java Secure Coding Guidelines that relate to the confidentiality and integrity requirements.

View

23-MAY-2017

Transactional Lock Elision Meets Combining

Flat combining (FC) and transactional lock elision (TLE) are two techniques that facilitate efficient multi-thread access to a sequentially implemented data structure protected by a lock. FC allows threads to delegate their operations to another (combiner) thread, and benefit from executing multiple operations by that thread under the lock through combining and elimination optimizations tailored to the specific data structure. TLE employs hardware transactional memory (HTM) that allows multiple threads to apply their operations concurrently as long as they do not conflict. This paper explores how these two radically different techniques can complement one another, and introduces the HTM-assisted Combining Framework (HCF). HCF leverages HTM to allow multiple combiners to run concurrently with each other, as well as with other, non-combiner threads. This makes HCF a good fit for data structures and workloads in which some operations may conflict with each other while others may run concurrently without conflicts. HCF achieves all that with changes to the sequential code similar to those required by TLE and FC, and in particular, without requiring the programmer to reason about concurrency.

Read

22-MAY-2017

Polyglot Native: Scala, Kotlin, and Other JVM-Based Languages with Instant Startup and low Footprint

Execution of JVM-based programs uses bytecode loading and interpretation, just-in-time compilation, and monolithic heaps. This causes JVM-based programs to startup slowly with a high memory footprint. In recent years, different projects were developed to address these issues: ahead-of-time compilation for the JVM (JEP 295) improves on JVM startup time while Scala Native and Kotlin/Native provide language-specific solutions by compiling code with LLVM and providing language-specific runtimes. We present Polyglot Native: an ahead-of-time compiler for Java bytecode combined with a low-footprint VM. With Polyglot Native, programs written in Kotlin, Scala, and other JVM-based languages have minimal startup time as they are compiled to native executables. Footprint of compiled programs is minimized by using a chunked heap and reducing necessary program metadata. In this talk, we show the architecture of Polyglot Native and compare it to existing projects. Then, we live-demo a project that compiles code from Kotlin, Scala, Java, and C into a single binary executable. Finally, we discuss intricacies of interoperability between Polyglot Native and C.

View


17-MAY-2017

Dynamic Adaptation of User Migration Policies in Distributed Virtual Environments

A distributed virtual environment (DVE) consists of multiple network nodes (servers), each of which can host many users that consume CPU resources on that node and communicate with users on other nodes. Users can be dynamically migrated between the nodes, and the ultimate goal for the migration policy is to minimize the average system response time perceived by the users. In order to achieve this, the user migration policy should minimize network communication while balancing the load among the nodes so that CPU resources of the individual nodes are not overloaded. This paper considers a multi-player online game as an example of a DVE and presents an adaptive distributed user migration policy, which uses Reinforcement Learning to tune itself so as to minimize the average system response time perceived by the users. Performance of the self-tuning policy was compared on a simulator with the standard benchmark non-adaptive migration policy and with the optimal static user allocation policy in a variety of scenarios, and the self-tuning policy was shown to greatly outperform both benchmark policies, with performance difference increasing as the network became more overloaded.

Read

17-MAY-2017

Evaluating Quality of Security Testing of the JDK

The document outlines the main challenges in evaluating test suites that check for security properties. Specifically, it considers testing the security properties of the JDK.

Read

17-MAY-2017

Truffle: your favorite language on JVM

Graal/Truffle is a project that aims to build multi-language, multi-tenant, multi-threaded, multi-node, multi-tooling and multi-system environment on top of JVM. Imagine that in order to develop a (dynamic) language implementation all you need is to write its interpreter in Java and immediately you get amazing peek performance, choice of several carefully tuned garbage collectors, tooling support, high speed interoperability with other languages and more. In this talk we'll take a look at how Truffle and Graal can achieve this and demonstrate the results on Ruby, JavaScript and R. Particular attention will be given to FastR the Truffle based R language implementation, its performance compared to GNU R and its support for Java interoperability including graphics.

Read

11-MAY-2017

Persistent Memcached: Bringing Legacy Code to Byte-Addressable Persistent Memory

We report our experience building and evaluating pmemcached, a version of memcached ported to byteaddressable persistent memory. Persistent memory is expected to not only improve overall performance of applications’ persistence tier, but also vastly reduce the “warm up” time needed for applications after a restart. We decided to test this hypothesis on memcached, a popular key-value store. We took the extreme view of persisting memcached’s entire state, resulting in a virtually instantaneous warm up phase. Since memcached is already optimized for DRAM, we expected our port to be a straightforward engineering effort. However, the effort turned out to be surprisingly complex during which we encountered several non-trivial problems that challenged the boundaries of memcached’s architecture. We detail these experiences and corresponding lessons learned.

Read

26-APR-2017

UMASS Data Science Talks

I'll be giving two talks at the UMASS data science event. The first talk is on our multilingual word embedding work. The second talk is on our constrained-inference approach for sequence-to-sequence neural networks. Relevant IP is covered in two patents and both pieces of work have previously been approved for publication (patent ref numbers and archivist ids provided below).

Read



Hardware and Software, Engineered to Work Together