Computers and Hacking: A 50-Year View

[Slides for a 20-minute keynote talk for the MIT HackMIT hackathon weekend, Saturday, September 14, 2019.] Fifty years ago, computers were expensive, institutional rather than personal, and hard to get access to. Today computers are relatively inexpensive, personal well as institutional, and ubiquitous. Some of the best hacking—and engineering—today involves creative use of relatively limited (and therefore cost-effective) computer resources.



What is a Secure Programming Language? (lecture + tutorial)

Lecture and tutorial using GraalVM and Simple Language.



What is a Secure Programming Language?

Our most sensitive and important software systems are written in programming languages that are inherently insecure, making the security of the systems themselves extremely challenging. It is often said that these systems were written with the best tools available at the time, so over time with newer languages will come more security. But we contend that all of today's mainstream programming languages are insecure, including even the most recent ones that come with claims that they are designed to be ``secure''. Our real criticism is the lack of a common understanding of what ``secure'' might mean in the context of programming language design. We propose a simple data-driven definition for a secure programming language: that it provides first-class language support to address the causes for the most common, significant vulnerabilities found in real-world software. To discover what these vulnerabilities actually are, we have analysed the National Vulnerability Database and devised a novel categorisation of the software defects reported in the database. This leads us to propose three broad categories, which account for over 50\% of all reported software vulnerabilities, that \emph{as a minimum} any secure language should address. While most mainstream languages address at least one of these categories, interestingly, we find that none address all three. Looking at today's real-world software systems, we observe a paradigm shift in design and implementation towards service-oriented architectures, such as microservices. Such systems consist of many fine-grained processes, typically implemented in multiple languages, that communicate over the network using simple web-based protocols, often relying on multiple software environments such as databases. In traditional software systems, these features are the most common locations for security vulnerabilities, and so are often kept internal to the system. In microservice systems, these features are no longer internal but external, and now represent the attack surface of the software system as a whole. The need for secure programming languages is probably greater now than it has ever been.



Non-Volatile Memory and Java: Part 3

A series of short articles about the impact of non-volatile memory (NVM) on the Java platform. In the first two articles I described the main hardware and software characteristics of Intel’s new Optane persistent memory. In this article I will discuss the implications of these characteristics on how we build software.



Non-volatile memory and Java, part 2: the view from software

In the first article I described the main hardware characteristics of Intel’s new Optane persistent memory. In this article I will discuss several software issues.



Non-volatile memory and Java: 1. Introducing NVM

Non-volatile RAM (NVRAM) has arrived into the computing mainstream. This development is likely to be highly disruptive: it will change the economics of the memory hierarchy by providing a new, intermediate level between DRAM and flash, but fully exploiting the new technology will require widespread changes in how we architect and write software. Despite this, there is surprisingly little awareness on the part of programmers (and their management) of the technology and its likely impact, and relatively little activity in academia (compared to the magnitude of the paradigm shift) in developing techniques and tools which programmers will need to respond to the change. In this series I will discuss the possible impact of NVRAM on the Java ecosystem. Java is the most widely used programming language: there are millions of Java developers and billions of lines of Java code in daily use.



Brief Announcement: Persistent Multi-Word Compare-and-Swap

This brief announcement presents a fundamental concurrent primitive for persistent memory – a persistent atomic multi-word compare-and-swap (PMCAS).We present a novel algorithm carefully crafted to ensure that atomic updates to a multitude of words modified by the PMCAS are persisted correctly. Our algorithm leverages hardware transactional memory (HTM) for concurrency control, and has a total of 3 persist barriers in its critical path. We also overview variants based on just the compare-and-swap (CAS) instruction and a hybrid of CAS and HTM.



A Two-List Framework for Accurate Detection of Frequent Items in Data Streams

The problem of detecting the most frequent items in large data sets and providing accurate frequency estimates for those items is becoming more and more important in a variety of domains. We propose a new two-list framework for addressing this problem, which extends the state-of-the-art Filtered Space-Saving (FSS) algorithm. An algorithm called FSSA giving an efficient array-based implementation of this framework is presented. An adaptive version of this algorithm is also presented, which adjusts the relative sizes of the two lists based on the estimated number of distinct keys in the data set. Analytical comparison with the FSS algorithm showed that FSSA has smaller expected frequency esti-mation errors, and experiments on both artificial and real workloads confirm this result. A theoretical analysis of space and time complexity for FSSA and its benchmark algorithms was performed. Finally, we showed that FSS2L frame-work can be naturally parallelized, leading to a linear decrease in the maximum frequency estimation error.



Closing the Performance Gap Between Volatile and Persistent Key-Value Stores Using Cross-Referencing Logs

Key-Value (K-V) stores are an integral building block of modern datacenter applications. With byte addressable persistent memory (PM) technologies, such as Intel/Micron’s 3D XPoint, on the horizon, there has been an influx of new high performance K-V stores that leverage PM for performance. However, there remains a significant performance gap between PM optimized K-V stores and DRAM resident ones, largely reflecting the gap between projected PM latency relative to that of DRAM. We address that performance gap with Bullet, a K-V store that leverages both the byte-addressability of PM and the lower latency of DRAM, using a technique called cross-referencing logs (CRLs) to keep most PM updates off the critical path. Bullet delivers performance approaching that of DRAM resident K-V stores by maintaining two hash tables, one in the slower (backend) PM and the other in the faster (frontend) DRAM. CRLs are a scalable persistent logging mechanism that keeps the two copies mutually consistent. Bullet also incorporates several critical optimizations, such as dynamic load balancing between frontend and backend threads, support for nonblocking Gets, and opportunistic omission of stale updates in the backend. This combination of implementation techniques delivers performance within 5% of that of DRAM-only key-value stores for realistic (read-heavy) workloads. Our general approach, based on CRLs, is “universal” in that it can be used to turn any volatile K-V store into a persistent one (or vice-versa, provide a fast cache for a persistent K-V store).



Placement of Virtual Containers on NUMA systems: A Practical and Comprehensive Model

Our work addresses the problem of placement of threads, or virtual cores, onto physical cores in a multicore NUMA system. Different placements result in varying degrees of contention for shared resources, so choosing the right placement can have a large effect on performance. Prior work has studied this problem, but either addressed hardware with specific properties, leaving us unable to generalize the models to other systems, or modeled much simpler effects than the actual performance in different placements. Our contribution is a general framework for reasoning about workload placement on machines with shared resources. It enables us to build an accurate performance model for any machine with a hierarchy of known shared resources automatically, with only minimal input from the user. Using our methodology, data center operators can minimize the number of NUMA (CPU+memory) nodes allocated for an application or a service, while ensuring that it meets performance objectives.


Hardware and Software, Engineered to Work Together