News


25-APR-2018

Analytics with smart arrays: adaptive and efficient language-independent data

This paper introduces smart arrays, an abstraction for providing adaptive and efficient language-independent data storage. Their smart functionalities include NUMA-aware data placement across sockets and bit compression. We show how our single C++ implementation can be used efficiently from both native C++ and compiled Java code. We experimentally evaluate smart arrays on a diverse set of C++ and Java analytics workloads. Further, we show how their smart functionalities affect performance and lead to differences in hardware resource demands on multi-core machines, motivating the need for adaptivity. We observe that smart arrays can significantly decrease the memory space requirements of analytics workloads, and improve their performance by up to 4×. Smart arrays are the first step towards general smart collections with various smart functionalities that enable the consumption of hardware resources to be traded-off against one another.

Read


17-APR-2018

An NVM Carol

Around 2010, we observed significant research activity around the development of non-volatile memory technologies. Shortly thereafter, other research communities began considering the implications of non-volatile memory on system design, from storage systems to data management solutions to entire systems. Finally, in July 2015, Intel and Micron Technology announced 3D XPoint. It’s now 2018; Intel is shipping its technology in SSD packages, but we’ve not yet seen the widespread availability of byte-addressable non-volatile memory that resides on the memory bus. We can view non-volatile memory technology and its impact on systems through an historical lens revealing it as the convergence of several past research trends starting with the concept of single-level store, encompassing the 1980s excitement around bubble memory, building upon persistent object systems, and leveraging recent work in transactional memory. We present this historical context, recalling past ideas that seem particularly relevant and potentially applicable and highlighting aspects that are novel.

Read

09-MAR-2018

Persistent Memory Transactions

This paper presents a comprehensive analysis of performance trade offs between implementation choices for transaction runtime systems on persistent memory. We compare three implementations of transaction runtimes: undo logging, redo logging, and copy-on-write. We also present a memory allocator that plugs into these runtimes. Our microbenchmark based evaluation focuses on understanding the interplay between various factors that contribute to performance differences between the three runtimes -- read/write access patterns of workloads, size of the persistence domain (portion of the memory hierarchy where the data is effectively persistent), cache locality, and transaction runtime bookkeeping overheads. No single runtime emerges as a clear winner. We confirm our analysis in more realistic settings of three "real world" applications we developed with our transactional API: (i) a key-value store we implemented from scratch, (ii) a SQLite port, and (iii) a persistified version of memcached, a popular key-value store. These findings are not only consistent with our microbenchmark analysis, but also provide additional interesting insights into other factors (e.g. effects of multithreading and synchronization) that affect application performance.

Read

24-FEB-2018

Dominance-Based Duplication Simulation (DBDS)

Compilers perform a variety of advanced optimizations to improve the quality of the generated machine code. However, optimizations that depend on the data flow of a program are often limited by control flow merges. Code duplication can solve this problem by hoisting, i.e. duplicating, instructions from merge blocks to their predecessors. However, finding optimization opportunities enabled by duplication is a non-trivial task that requires compile-time intensive analysis. This imposes a challenge on modern (just-in-time) compilers: Duplicating instructions tentatively at every control flow merge is not feasible because excessive duplication leads to uncontrolled code growth and compile time increases. Therefore, compilers need to find out whether a duplication is beneficial enough to be performed. This paper proposes a novel approach to determine which duplication operations should be performed to increase performance. The approach is based on a duplication simulation that enables a compiler to evaluate different success metrics per potential duplication. Using this information, the compiler can then select the most promising candidates for optimization. We show how to map duplication candidates into an optimization cost model that allows us to trade-off between different success metrics including peak performance, code size and compile time. We implemented the approach on top of the GraalVM and evaluated it with the benchmarks Java DaCapo, Scala DaCapo, JavaScript Octane and a micro-benchmark suite, in terms of performance, compilation time and code size increase.

Read

21-MAR-2017

Self-managed collections: Off-heap memory management for scalable query-dominated collections

Explosive growth in DRAM capacities and the emergence of language-integrated query enable a new class of man- aged applications that perform complex query processing on huge volumes of data stored as collections of objects in the memory space of the application. While more flexible in terms of schema design and application development, this approach typically experiences sub-par query execution per- formance when compared to specialized systems like DBMS. To address this issue, we propose self-managed collections, which utilize off-heap memory management and dynamic query compilation to improve the performance of querying managed data through language-integrated query. We eval- uate self-managed collections using both microbenchmarks and enumeration-heavy queries from the TPC-H business intelligence benchmark. Our results show that self-managed collections outperform ordinary managed collections in both query processing and memory management by up to an order of magnitude and even outperform an optimized in- memory columnar database system for the vast majority of queries.

Read


02-NOV-2017

Generic Concurrency Restriction - slides for the Dagstuhl seminar

The slides provide an overview of our work on generic concurrency restriction, which has been previously cleared for publication.

View

31-OCT-2017

Process Independent AMS Circuit Generator Design Methodology

We present an update on BAG, a framework for the development of process independent Analog and Mixed Signal (AMS) circuit generators. Such generators are parametrized de- sign procedures that produce schematics, layouts, and verification testbenches for a circuit given top level specifications. We describe a universal AMS circuit verification framework built with BAG, and two new layout engines, XBase and Laygo, that enables pro- cess independent layout generation. We have developed various complex circuit generators as driving examples, including a time- interleaved SAR ADC and a SerDes transceiver frontend, taped out in TSMC 16nm FFC process. We also verify our claims of process portability by presenting layouts generated in various technology nodes, including TSMC 28nm, TSMC 16nm, TSMC 7nm, Global Foundries 45nm RF-SOI, and ST 28nm FD-SOI.

Read

30-OCT-2017

Sulong, and Thanks For All the Bugs: Finding Errors in C Programs by Abstracting from the Native Execution Model

In C, memory errors such as buffer overflows are among the most dangerous software errors; as we show, they are still on the rise. Current dynamic bug finding tools that try to detect such errors are based on the low-level execution model of the machine. They insert additional checks in an ad-hoc fashion, which makes them prone to forgotten checks for corner-cases. To address this issue, we devised a novel approach to find bugs during the execution of a program. At the core of this approach lies an interpreter that is written in a high-level language that performs automatic checks (such as bounds checks, NULL checks, and type checks). By mapping C data structures to data structures of the high-level language, accesses are automatically checked and bugs are found. We implemented this approach and show that our tool (called Safe Sulong) can find bugs that have been overlooked by state-of-the-art tools, such as out-of-bounds accesses to the main function arguments. Additionally, we demonstrate that the overheads are low enough to make our tool practical, both during development and in production for safety-critical software projects.

Read

25-OCT-2017

It's Time for Secure Languages

Slides summarising data from the National Vulnerability Database for the past 4 years pointing at the need for better language design.

View

25-OCT-2017

It's Time for Secure Languages (talk abstract)

Language designers and developers want better ways to write good code– languages designed with simpler, more powerful abstractions accessible to a larger community of developers. However, language design does not seem to take into account security, leaving developers with the onerous task of writing attack-proof code. In 20 years, we have gone from 25 reported vulnerabilities to 6,000+ vulnerabilities reported in a year. The top two types of vulnerabilities for the past few years have been known for over 15+ years. I’ll summarise data on vulnerabilities during 2013-2015 and argue that our languages must take security seriously. Languages need security-oriented constructs, and compilers must let developers know when there is a problem with their code. We need to empower developers with the concept of “security for the masses” by making available languages that do not necessarily require an expert in order to determine whether the code being written is vulnerable to attack or not.

View

22-OCT-2017

Making collection operations optimal with aggressive JIT compilation

Functional collection combinators are a neat and widely accepted data processing abstraction. However, their generic nature results in high abstraction overheads -- Scala collections are known to be notoriously slow for typical tasks. We show that proper optimizations in a JIT compiler can widely eliminate overheads imposed by these abstractions. Using the open-source Graal JIT compiler, we achieve speedups of up to 20x on collection workloads compared to the standard HotSpot C2 compiler. Consequently, a sufficiently aggressive JIT compiler allows the language compiler, such as Scalac, to focus on other concerns. In this paper, we show how optimizations, such as inlining, polymorphic inlining, and partial escape analysis, are combined in Graal to produce collections code that is optimal with respect to manually written code, or close to optimal. We argue why some of these optimizations are more effectively done by a JIT compiler. We then identify specific use-cases that most current JIT compilers do not optimize well, warranting special treatment from the language compiler.

Read

13-OCT-2017

Design Considerations of Monolithically Integrated Voltage Regulators for Multicore Processors

Presented in this paper are design considerations for a Monolithically Integrated Voltage Regulator (MIVR) targeting a 42mm2 multicore processor test chip taped-out in TSMC 28nm process. This is the first work discussing the utilization of on-die magnetic core inductors to support >50A of load current. 64 inductors with switching frequency of 140MHz are strategically grouped into 8 interleaving phases to achieve 85% efficiency and minimize on-die voltage drop.

Read


Hardware and Software, Engineered to Work Together