A General Model for Placement of Workloads on Multicore NUMA Systems

A General Model for Placement of Workloads on Multicore NUMA Systems

22 April 2017

The problem of mapping threads, or virtual cores, to physical cores on multicore systems has been studied for over a decade. Despite this effort, there is still no method that will help us decide in real time and for arbitrary workloads the relative impact of different mappings on performance. Prior work has made large strides in this field, but these solutions addressed a limited set of concerns (e.g., only shared caches and memory controllers, or only asymmetric interconnects), assuming hardware with specific properties and leaving us unable to generalize the model to other systems. Our contribution is an abstract machine model that enables us to automatically build a performance prediction model for any machine with a hierarchy of shared resources. In the process of developing the methodology for building predictive models we discovered pitfalls of using hardware performance counters, a de facto technique embraced by the community for decades. Our new methodology does not rely on hardware counters at the expense of trying a handful of additional workload mappings (out of many possible) at runtime. Using this methodology data center operators can decide on the smallest number of NUMA (CPU+memory) nodes to use for the target application or service (which we assume to be encapsulated into a virtual container so as to match the reality of the modern cloud systems such as AWS), while still meeting performance goals. More broadly, the methodology empowers them to efficiently “pack” virtual containers on the physical hardware in a data center.


Venue : ACM Symposium on Operating Systems Principles (SOSP 2017)

File Name : sosp17-paper196.pdf