An early look at the LDBC social network benchmark's business intelligence workload

In this short paper, we provide an early look at the LDBC Social Network Benchmark's Business Intelligence (BI) workload which tests graph data management systems on a graph business analytics workload. Its queries involve complex aggregations and navigations (joins) that touch large data volumes, which is typical in BI workloads, yet they depend heavily on graph functionality such as connectivity tests and path finding. We outline the motivation for this new benchmark, which we derived from many interactions with the graph database industry and its users, and situate it in a scenario of social network analysis. The workload was designed by taking into account technical "chokepoints" identified by database system architects from academia and industry, which we also describe and map to the queries. We present reference implementations in openCypher, PGQL, SPARQL, and SQL, and preliminary results of SNB BI on a number of graph data management systems.



Analytics with smart arrays: adaptive and efficient language-independent data

This paper introduces smart arrays, an abstraction for providing adaptive and efficient language-independent data storage. Their smart functionalities include NUMA-aware data placement across sockets and bit compression. We show how our single C++ implementation can be used efficiently from both native C++ and compiled Java code. We experimentally evaluate smart arrays on a diverse set of C++ and Java analytics workloads. Further, we show how their smart functionalities affect performance and lead to differences in hardware resource demands on multi-core machines, motivating the need for adaptivity. We observe that smart arrays can significantly decrease the memory space requirements of analytics workloads, and improve their performance by up to 4×. Smart arrays are the first step towards general smart collections with various smart functionalities that enable the consumption of hardware resources to be traded-off against one another.



An NVM Carol

Around 2010, we observed significant research activity around the development of non-volatile memory technologies. Shortly thereafter, other research communities began considering the implications of non-volatile memory on system design, from storage systems to data management solutions to entire systems. Finally, in July 2015, Intel and Micron Technology announced 3D XPoint. It’s now 2018; Intel is shipping its technology in SSD packages, but we’ve not yet seen the widespread availability of byte-addressable non-volatile memory that resides on the memory bus. We can view non-volatile memory technology and its impact on systems through an historical lens revealing it as the convergence of several past research trends starting with the concept of single-level store, encompassing the 1980s excitement around bubble memory, building upon persistent object systems, and leveraging recent work in transactional memory. We present this historical context, recalling past ideas that seem particularly relevant and potentially applicable and highlighting aspects that are novel.



Persistent Memory Transactions

This paper presents a comprehensive analysis of performance trade offs between implementation choices for transaction runtime systems on persistent memory. We compare three implementations of transaction runtimes: undo logging, redo logging, and copy-on-write. We also present a memory allocator that plugs into these runtimes. Our microbenchmark based evaluation focuses on understanding the interplay between various factors that contribute to performance differences between the three runtimes -- read/write access patterns of workloads, size of the persistence domain (portion of the memory hierarchy where the data is effectively persistent), cache locality, and transaction runtime bookkeeping overheads. No single runtime emerges as a clear winner. We confirm our analysis in more realistic settings of three "real world" applications we developed with our transactional API: (i) a key-value store we implemented from scratch, (ii) a SQLite port, and (iii) a persistified version of memcached, a popular key-value store. These findings are not only consistent with our microbenchmark analysis, but also provide additional interesting insights into other factors (e.g. effects of multithreading and synchronization) that affect application performance.



Dominance-Based Duplication Simulation (DBDS)

Compilers perform a variety of advanced optimizations to improve the quality of the generated machine code. However, optimizations that depend on the data flow of a program are often limited by control flow merges. Code duplication can solve this problem by hoisting, i.e. duplicating, instructions from merge blocks to their predecessors. However, finding optimization opportunities enabled by duplication is a non-trivial task that requires compile-time intensive analysis. This imposes a challenge on modern (just-in-time) compilers: Duplicating instructions tentatively at every control flow merge is not feasible because excessive duplication leads to uncontrolled code growth and compile time increases. Therefore, compilers need to find out whether a duplication is beneficial enough to be performed. This paper proposes a novel approach to determine which duplication operations should be performed to increase performance. The approach is based on a duplication simulation that enables a compiler to evaluate different success metrics per potential duplication. Using this information, the compiler can then select the most promising candidates for optimization. We show how to map duplication candidates into an optimization cost model that allows us to trade-off between different success metrics including peak performance, code size and compile time. We implemented the approach on top of the GraalVM and evaluated it with the benchmarks Java DaCapo, Scala DaCapo, JavaScript Octane and a micro-benchmark suite, in terms of performance, compilation time and code size increase.



Self-managed collections: Off-heap memory management for scalable query-dominated collections

Explosive growth in DRAM capacities and the emergence of language-integrated query enable a new class of man- aged applications that perform complex query processing on huge volumes of data stored as collections of objects in the memory space of the application. While more flexible in terms of schema design and application development, this approach typically experiences sub-par query execution per- formance when compared to specialized systems like DBMS. To address this issue, we propose self-managed collections, which utilize off-heap memory management and dynamic query compilation to improve the performance of querying managed data through language-integrated query. We eval- uate self-managed collections using both microbenchmarks and enumeration-heavy queries from the TPC-H business intelligence benchmark. Our results show that self-managed collections outperform ordinary managed collections in both query processing and memory management by up to an order of magnitude and even outperform an optimized in- memory columnar database system for the vast majority of queries.


Hardware and Software, Engineered to Work Together