News

16-JUN-2017

Lenient Execution of C on a JVM -- How I Learned to Stop Worrying and Execute the Code

Most C programs do not strictly conform to the C standard, and often show undefined behavior, e.g., on signed integer overflow. When compiled by non-optimizing compilers, such programs often behave as the programmer intended. However, optimizing compilers may exploit undefined semantics for more aggressive optimizations, thus possibly breaking the code. Analysis tools can help to find and fix such issues. Alternatively, one could define a C dialect in which clear semantics are defined for frequent program patterns whose behavior would otherwise be undefined. In this paper, we present such a dialect, called Lenient C, that specifies semantics for behavior that the standard left open for interpretation. Specifying additional semantics enables programmers to safely rely on otherwise undefined patterns. Lenient C aims towards being executed on a managed runtime such as the JVM. We demonstrate how we implemented the dialect in Safe Sulong, a C interpreter with a dynamic compiler that runs on the JVM.

Read

06-JUN-2017

Towards Understandable Smart Contracts

Blockchains and smart contracts can facilitate trustworthy business processes between participants who do not trust each other. However, understanding smart contracts typically requires programming skills. We outline research towards enabling smart contracts that are understandable by humans too.

Read

24-MAY-2017

Inference of Security-Sensitive Entities in Libraries

Programming languages such as Java and CRL execute code with different levels of trust in the same process, and rely on an access control model with fine-grained permissions to protect program code. Permissions are checked programmatically, and rely on programmer discipline. This can lead to subtle errors. To enable automatic security analysis about unauthorised access or information flow, it is necessary to reason about security-sensitive entities in libraries that must be protected by appropriate sanitization/declassification via permission checks. Unfortunately, security-sensitive entities are not clearly identified. In this paper, we investigate security-sensitive entities used in Java-like languages, and develop a static program analysis technique to identify them in large codebases by analysing the patterns of permission checks. Although the technique is generic, our focus is on Java where checkPermission calls are used to guard potential security-sensitive entities. Our inference analysis uses two parameters called proximity and coverage to reduce false positive/negative reports. The usefulness of the analysis is illustrated by the results obtained while checking the OpenJDK7-b147 for conformance to Java Secure Coding Guidelines that relate to the confidentiality and integrity requirements.

View

23-MAY-2017

Transactional Lock Elision Meets Combining

Flat combining (FC) and transactional lock elision (TLE) are two techniques that facilitate efficient multi-thread access to a sequentially implemented data structure protected by a lock. FC allows threads to delegate their operations to another (combiner) thread, and benefit from executing multiple operations by that thread under the lock through combining and elimination optimizations tailored to the specific data structure. TLE employs hardware transactional memory (HTM) that allows multiple threads to apply their operations concurrently as long as they do not conflict. This paper explores how these two radically different techniques can complement one another, and introduces the HTM-assisted Combining Framework (HCF). HCF leverages HTM to allow multiple combiners to run concurrently with each other, as well as with other, non-combiner threads. This makes HCF a good fit for data structures and workloads in which some operations may conflict with each other while others may run concurrently without conflicts. HCF achieves all that with changes to the sequential code similar to those required by TLE and FC, and in particular, without requiring the programmer to reason about concurrency.

Read

22-MAY-2017

Polyglot Native: Scala, Kotlin, and Other JVM-Based Languages with Instant Startup and low Footprint

Execution of JVM-based programs uses bytecode loading and interpretation, just-in-time compilation, and monolithic heaps. This causes JVM-based programs to startup slowly with a high memory footprint. In recent years, different projects were developed to address these issues: ahead-of-time compilation for the JVM (JEP 295) improves on JVM startup time while Scala Native and Kotlin/Native provide language-specific solutions by compiling code with LLVM and providing language-specific runtimes. We present Polyglot Native: an ahead-of-time compiler for Java bytecode combined with a low-footprint VM. With Polyglot Native, programs written in Kotlin, Scala, and other JVM-based languages have minimal startup time as they are compiled to native executables. Footprint of compiled programs is minimized by using a chunked heap and reducing necessary program metadata. In this talk, we show the architecture of Polyglot Native and compare it to existing projects. Then, we live-demo a project that compiles code from Kotlin, Scala, Java, and C into a single binary executable. Finally, we discuss intricacies of interoperability between Polyglot Native and C.

View


17-MAY-2017

Dynamic Adaptation of User Migration Policies in Distributed Virtual Environments

A distributed virtual environment (DVE) consists of multiple network nodes (servers), each of which can host many users that consume CPU resources on that node and communicate with users on other nodes. Users can be dynamically migrated between the nodes, and the ultimate goal for the migration policy is to minimize the average system response time perceived by the users. In order to achieve this, the user migration policy should minimize network communication while balancing the load among the nodes so that CPU resources of the individual nodes are not overloaded. This paper considers a multi-player online game as an example of a DVE and presents an adaptive distributed user migration policy, which uses Reinforcement Learning to tune itself so as to minimize the average system response time perceived by the users. Performance of the self-tuning policy was compared on a simulator with the standard benchmark non-adaptive migration policy and with the optimal static user allocation policy in a variety of scenarios, and the self-tuning policy was shown to greatly outperform both benchmark policies, with performance difference increasing as the network became more overloaded.

Read

17-MAY-2017

Evaluating Quality of Security Testing of the JDK

The document outlines the main challenges in evaluating test suites that check for security properties. Specifically, it considers testing the security properties of the JDK.

Read

17-MAY-2017

Truffle: your favorite language on JVM

Graal/Truffle is a project that aims to build multi-language, multi-tenant, multi-threaded, multi-node, multi-tooling and multi-system environment on top of JVM. Imagine that in order to develop a (dynamic) language implementation all you need is to write its interpreter in Java and immediately you get amazing peek performance, choice of several carefully tuned garbage collectors, tooling support, high speed interoperability with other languages and more. In this talk we'll take a look at how Truffle and Graal can achieve this and demonstrate the results on Ruby, JavaScript and R. Particular attention will be given to FastR the Truffle based R language implementation, its performance compared to GNU R and its support for Java interoperability including graphics.

Read

11-MAY-2017

Persistent Memcached: Bringing Legacy Code to Byte-Addressable Persistent Memory

We report our experience building and evaluating pmemcached, a version of memcached ported to byteaddressable persistent memory. Persistent memory is expected to not only improve overall performance of applications’ persistence tier, but also vastly reduce the “warm up” time needed for applications after a restart. We decided to test this hypothesis on memcached, a popular key-value store. We took the extreme view of persisting memcached’s entire state, resulting in a virtually instantaneous warm up phase. Since memcached is already optimized for DRAM, we expected our port to be a straightforward engineering effort. However, the effort turned out to be surprisingly complex during which we encountered several non-trivial problems that challenged the boundaries of memcached’s architecture. We detail these experiences and corresponding lessons learned.

Read

26-APR-2017

UMASS Data Science Talks

I'll be giving two talks at the UMASS data science event. The first talk is on our multilingual word embedding work. The second talk is on our constrained-inference approach for sequence-to-sequence neural networks. Relevant IP is covered in two patents and both pieces of work have previously been approved for publication (patent ref numbers and archivist ids provided below).

Read



22-APR-2017

Pandia: comprehensive contention-sensitive thread placement.

Pandia is a system for modelling the performance of in memory parallel workloads. It generates a description of a workload from a series of profiling runs, and combines this with a description of the machine's hardware to model the workload's performance over different thread counts and different placements of those threads.

The approach is “comprehensive” in that it accounts for contention at multiple resources such as processor functional units and memory channels. The points of contention for a workload can shift between resources as the degree of parallelism and thread placement changes. Pandia accounts for these changes and provides a close correspondence between predicted performance and actual performance. Testing a set of 22 benchmarks on 2 socket Intel machines fitted with chips ranging from Sandy Bridge to Haswell we see median differences of 1.05% to 0% between the fastest predicted placement and the fastest measured placement, and median errors of 8% to 4% across all placements.

Pandia can be used to optimize the performance of a given workload for instance, identifying whether or not multiple processor sockets should be used, and whether or not the workload benefits from using multiple threads per core. In addition, Pandia can be used to identify opportunities for reducing resource consumption where additional resources are not matched by additional performance for instance, limiting a workload to a small number of cores when its scaling is poor.

Read

22-APR-2017

A General Model for Placement of Workloads on Multicore NUMA Systems

The problem of mapping threads, or virtual cores, to physical cores on multicore systems has been studied for over a decade. Despite this effort, there is still no method that will help us decide in real time and for arbitrary workloads the relative impact of different mappings on performance. Prior work has made large strides in this field, but these solutions addressed a limited set of concerns (e.g., only shared caches and memory controllers, or only asymmetric interconnects), assuming hardware with specific properties and leaving us unable to generalize the model to other systems. Our contribution is an abstract machine model that enables us to automatically build a performance prediction model for any machine with a hierarchy of shared resources. In the process of developing the methodology for building predictive models we discovered pitfalls of using hardware performance counters, a de facto technique embraced by the community for decades. Our new methodology does not rely on hardware counters at the expense of trying a handful of additional workload mappings (out of many possible) at runtime. Using this methodology data center operators can decide on the smallest number of NUMA (CPU+memory) nodes to use for the target application or service (which we assume to be encapsulated into a virtual container so as to match the reality of the modern cloud systems such as AWS), while still meeting performance goals. More broadly, the methodology empowers them to efficiently “pack” virtual containers on the physical hardware in a data center.

Read


Hardware and Software, Engineered to Work Together