Trash Day: Coordinating Garbage Collection in Distributed Systems
Trash Day: Coordinating Garbage Collection in Distributed Systems
18 May 2015
Cloud systems such as Hadoop, Spark and Zookeeper are frequently written in Java or other garbage-collected languages. However, GC-induced pauses can have a signifi- cant impact on these workloads. Specifically, GC pauses can reduce throughput for batch workloads, and cause high tail-latencies for interactive applications. In this paper, we show that distributed applications suffer from each node’s language runtime system making GC-related decisions independently. We first demonstrate this problem on two widely-used systems (Apache Spark and Apache Cassandra). We then propose solving this problem using a Holistic Runtime System, a distributed language runtime that collectively manages runtime services across multiple nodes. We present initial results to demonstrate that this Holistic GC approach is effective both in reducing the impact of GC pauses on a batch workload, and in improving GC-related tail-latencies in an interactive setting.
Venue : HotOS: USENIX Workshop on Hot Topics in Operating Systems