What is a Secure Programming Language? (POPL slides)

Our most sensitive and important software systems are written in programming languages that are inherently insecure, making the security of the systems themselves extremely challenging. It is often said that these systems were written with the best tools available at the time, so over time with newer languages will come more security. But we contend that all of today’s mainstream programming languages are insecure, including even the most recent ones that come with claims that they are designed to be “secure”. Our real criticism is the lack of a common understanding of what “secure” might mean in the context of programming language design. We propose a simple data-driven definition for a secure programming language: that it provides first-class language support to address the causes for the most common, significant vulnerabilities found in real-world software. To discover what these vulnerabilities actually are, we have analysed the National Vulnerability Database and devised a novel categorisation of the software defects reported in the database. This leads us to propose three broad categories, which account for over 50% of all reported software vulnerabilities, that as a minimum any secure language should address. While most mainstream languages address at least one of these categories, interestingly, we find that none address all three. Looking at today’s real-world software systems, we observe a paradigm shift in design and implementation towards service-oriented architectures, such as microservices. Such systems consist of many fine-grained processes, typically implemented in multiple languages, that communicate over the network using simple web-based protocols, often relying on multiple software environments such as databases. In traditional software systems, these features are the most common locations for security vulnerabilities, and so are often kept internal to the system. In microservice systems, these features are no longer internal but external, and now represent the attack surface of the software system as a whole. The need for secure programming languages is probably greater now than it has ever been.



PGX and Graal/Truffle/Active Libraries

A guest lecture in the CS4200 Compiler Construction course at Delft University of Technology ( about PGX and Graal/Truffle/Active Libraries.



Computationally Easy, Spectrally Good Multipliers for Congruential Pseudorandom Number Generators

Congruential pseudorandom number generators rely on good multipliers, that is, integers that have good performance with respect to the spectral test. We provide lists of multipliers with a good lattice structure up to dimension eight for generators with typical power-of-two moduli, analyzing in detail multipliers close to the square root of the modulus, whose product can be computed quickly.



Efficient Multi-Word Compare and Swap.

Atomic lock-free multi-word compare-and-swap (MCAS) is a powerful tool for designing concurrent algorithms. Yet, its widespread usage has been limited because lock-free implementations of MCAS make heavy use of expensive compare-and-swap (CAS) instructions. Existing MCAS implementations indeed use at least 2k+1 CASes per k-CAS. This leads to the natural desire to minimize the number of CASes required to implement MCAS. We first prove in this paper that it is impossible to "pack" the information required to perform a k-word CAS (k-CAS) in less than k locations to be CASed. Then we present the first algorithm that requires k+1 CASes per call to k-CAS in the common uncontended case. We implement our algorithm and show that it outperforms a state-of-the-art baseline in a variety of benchmarks in most considered workloads. We also present a durably linearizable (persistent memory friendly) version of our MCAS algorithm using only 2 persistence fences per call, while still only requiring k+1 CASes per k-CAS.



Microsecond Consensus for Microsecond Applications.

We consider the problem of making apps fault-tolerant through replication, when apps operate at the microsecond scale, as in finance, embedded computing, and microservices apps. These apps need a replication scheme that also operates at the microsecond scale, otherwise replication becomes a burden. We propose Mu, a system that takes less than 1.3 microseconds to replicate a (small) request in memory, and less than a millisecond to fail-over the system - this cuts the replication and fail-over latencies of the prior systems by at least 61% and 90%.
Mu implements bona fide state machine replication/consensus (SMR) with strong consistency for a generic app, but it really shines on microsecond apps, where even the smallest overhead is significant. To provide this performance, Mu introduces a new SMR protocol that carefully leverages RDMA. Roughly, in Mu a leader replicates a request by simply writing it directly to the log of other replicas using RDMA, without any additional communication. Doing so, however, introduces the challenge of handling concurrent leaders, changing leaders, garbage collecting the logs, and more - challenges that we address in this paper through a judicious combination of RDMA permissions and distributed algorithmic design.
We implemented Mu and used it to replicate several systems: a financial exchange app called Liquibook, Redis, Memcached, and HERD. Our evaluation shows that Mu incurs a small replication latency, in some cases being the only viable replication system that incurs an acceptable overhead.



A DSL-based framework for performance assessment

Performance assessment is an essential verification practice in both research and industry for software quality assurance. Experiment setups for performance assessment tend to be complex. A typical experiment needs to be run for a variety of involved hardware, software versions, system settings and input parameters. Typical approaches for performance assessment are based on scripts. They do not document all variants explicitly, which makes it hard to analyze and reproduce experiment results correctly. In general they tend to be monolithic which makes it hard to extend experiment setups systematically and to reuse features such as result storage and analysis consistently across experi- ments. In this paper, we present a generic approach and a DSL-based framework for performance assessment. The DSL helps the user to set and organize the variants in an experiment setup explicitly. The Runtime module in our framework executes experiments after which results are stored together with the corresponding setups in a database. Database queries provide easy access to the results of previous experiments and the correct analysis of experiment results in context of the experiment setup. Furthermore, we describe operations for common problems in performance assessment such as outlier detection. At Oracle, we successfully instantiate the framework and use it to nightly assess the performance of PGX [12, 6], a toolkit for parallel graph analytics.



Design Space Exploration of Power Delivery For Advanced Packaging Technologies

***Note: this is work the VLSI Research group did with Prof. Bakir back in 2017. The student is now graduating, and wants to finalize/publish this work.*** In this paper, a design space exploration of power delivery networks is performed for multi-chip 2.5-D and 3-D IC technologies. The focus of the paper is the effective placement of the voltage regulator modules (VRMs) for power supply noise (PSN) suppression. Multiple on-package VRM configurations have been analyzed and compared. Additionally, 3D IC chipon-VRM and backside-of-the-package VRM configurations are studied. From the PSN perspective, the 3D IC chip-on-VRM case suppresses the PSN the most even with high current density hotspots. The paper also studies the impact of different parameters such as VRM-chip distance on the package, on-chip decoupling capacitor density, etc. on the PSN.



GraalVM Slides for JAX London

These are intro-level slides to be presented at



Computers and Hacking: A 50-Year View

[Slides for a 20-minute keynote talk for the MIT HackMIT hackathon weekend, Saturday, September 14, 2019.] Fifty years ago, computers were expensive, institutional rather than personal, and hard to get access to. Today computers are relatively inexpensive, personal well as institutional, and ubiquitous. Some of the best hacking—and engineering—today involves creative use of relatively limited (and therefore cost-effective) computer resources.



The Impact of RDMA on Agreement.

Remote Direct Memory Access (RDMA) is becoming widely available in data centers. This technology allows a process to directly read and write the memory of a remote host, with a mechanism to control access permissions. In this paper, we study the fundamental power of these capabilities. We consider the well-known problem of achieving consensus despite failures, and find that RDMA can improve the inherent trade-off in distributed computing between failure resilience and performance. Specifically, we show that RDMA allows algorithms that simultaneously achieve high resilience and high performance, while traditional algorithms had to choose one or another. With Byzantine failures, we give an algorithm that only requires n \geq 2f_P + 1 processes (where f_P is the maximum number of faulty processes) and decides in two (network) delays in common executions. With crash failures, we give an algorithm that only requires n \geq f_P + 1 processes and also decides in two delays. Both algorithms tolerate a minority of memory failures inherent to RDMA, and they provide safety in asynchronous systems and liveness with standard additional assumptions.


Hardware and Software, Engineered to Work Together