Taming Multi-GPU Greedy Scheduling Through a Polyglot Runtime

Taming Multi-GPU Greedy Scheduling Through a Polyglot Runtime

Marco Arnaboldi, Arnaud Delamare, Daniele Bonetta, Guido Di Donato, Alberto Parravicini, Ian Di Dio, Marco Santambrogio

30 October 2023

Multi-GPU systems are increasingly being deployed in cloud data centers, but using GPUs efficiently from highlevel programming languages remains a challenge. Moreover, exploiting the full capabilities of multi-GPU systems is an arduous task due to the complex interconnection topology between available accelerators and the variety of inter-GPU communication patterns exhibited by different workloads. This work introduces a novel scheduler for multi-task GPU computations that provides transparent asynchronous execution on multi- GPU systems without requiring prior information about the program dependencies or the underlying system architecture. Our scheduler integrates with the polyglot GraalVM ecosystem and is therefore available for multiple high-level languages, providing a general framework that can significantly lower the barriers to entry to multi-GPU acceleration. We validate our work on a set of benchmarks designed to investigate scalability and inter-GPU communication. Experimental results show how our scheduler automatically achieves 80-90% peak performance against hand-optimized CUDA host code on Volta and Ampere multi-GPU systems.


Venue : IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS

File Name : GrCUDA_MultiGPU___TPDS.pdf