Improving application performance down to the scheduler level

Project

Improving application performance down to the scheduler level

Principal Investigator

Inria

Oracle Fellowship Recipient

Damien Carver

Oracle Principal Investigator

Greg Marsden
Jean-Pierre Lozi, Principal Member of Technical Staff

Summary

Recent research and experience at Oracle have shown that operating system schedulers are often a performance bottleneck on multicore architectures because in order to scale, schedulers cannot make optimal decisions and instead have to rely on heuristics. To better analyze scheduler behavior, this ERO project led to the development of efficient profiling tools for the Linux scheduler over the previous year. In particular, our profiling tools made it possible to identify a scheduling issue on multicore machines that feature per-core frequency scaling, in which a high-frequency, “hot” core becomes idle as a newly created or woken up task is placed on a low-frequency, “cold” core. Such frequency inversion is commonplace in many workloads, including shell scripts that create a lot of processes through the standard fork/wait POSIX system calls, and producer-consumer applications. This led us to design scheduling techniques that improve the performance of workloads that exhibit recurrent frequency inversion while having minimal impact on other workloads. These findings were presented in a paper that was published at PLOS '19, and in another paper that is currently under review.

For the upcoming year, the main objective of this project is to write a frequency-aware version of the Linux scheduler that focuses on improving performance, energy consumption, and/or stability of application execution times. This endeavor will require large-scale changes to the Linux scheduler, and as a result, SchedLog and SchedDisplay will play a critical role as they will make it possible to quickly understand the low-level impact of each of these changes during development. A secondary objective of this project for the upcoming year will be to improve SchedLog and SchedDisplay in the process. In particular, we would like our profiling tools to automatically detect specific cases of poor scheduling, and to pinpoint which patterns of scheduling decisions led to such situations.