Many distributed workloads in today’s data centers are written
in managed languages such as Java or Ruby. Examples
include big data frameworks such as Hadoop, data stores
such as Cassandra or applications such as the SOLR search
engine. These workloads typically run across many independent
language runtime systems on different nodes.
This setup represents a source of inefficiency, as these
language runtime systems are unaware of each other. For
example, they may perform Garbage Collection at times that
are locally reasonable but not in a distributed setting.
We address these problems by introducing the concept
of a Holistic Runtime System that makes runtime-level decisions
for the entire distributed application rather than locally.
We then present Taurus, a Holistic Runtime System
prototype. Taurus is a JVM drop-in replacement, requires almost
no configuration and can run unmodified off-the-shelf
Java applications. Taurus enforces user-defined coordination
policies and provides a DSL for writing these policies.
By applying Taurus to Garbage Collection, we demonstrate
the potential of such a system and use it to explore
coordination strategies for the runtime systems of real-world
distributed applications, to improve application performance
and address tail-latencies in latency-sensitive workloads.