An interconnect for a high-performance cluster has to be optimized in
respect to both high throughput and low latency. To avoid the tradeoff
between throughput and latency, the cluster interconnect Clint has a
segregated architecture that provides two physically separate transmission
channels: a bulk channel optimized for high-bandwidth traffic and a quick
channel optimized for low-latency traffic. Different scheduling strategies
are applied. The bulk channel uses a scheduler that globally allocates time
slots on the transmission paths before packets are sent off. In this way,
collisions as well as blockages are avoided. In contrast, the quick channel
takes a best-effort approach by sending packets whenever they are available
thereby risking collisions and retransmissions.
Clint is targeted specifically at small- to medium-sized clusters offering
a low-cost alternative to symmetric multiprocessor (SMP) systems. This
design point allows for a simple and cost-effective implementation. In
particular, by buffering packets only on the hosts and not requiring any
buffer memory on the switches, protocols are simplified as switch
forwarding delays are fixed, and throughput is optimized as the use of a
global schedule is now possible.
This report is an extended version of a paper presented at SC2002,
Baltimore, Maryland, November 2002.