One challenge for runtime systems like the Java™ platform that depend on
garbage collection is the ability to scale performance with the number of
allocating threads. As the number of such threads grows, allocation of
memory in the heap becomes a point of contention. To relieve this
contention, many collectors allow threads to preallocate blocks of memory
from the shared heap. These per-thread local-allocation buffers (LABs)
allow threads to allocate most objects without any need for further
synchronization. As the number of threads exceeds the number of processors,
however, the cost of committing memory to local-allocation buffers becomes
a challenge and sophisticated LAB-sizing policies must be employed.
To reduce this complexity, we implement support for local-allocation
buffers associated with processors instead of threads using multiprocess
restartable critical sections (MP-RCSs). MP-RCSs allow threads to
manipulate processor-local data safely. To support processor-spe-cific
transactions in dynamically generated code, we have developed a novel
mechanism for implementing these critical sections that is efficient,
allows preemption-notification at known points in a given critical section,
and does not require explicit registration of the critical sec-tions.
Finally, we analyze the performance of per-processor LABs and show that,
for highly threaded applications, this approach performs better than
per-thread LABs, and allows for simpler LAB-sizing policies.