08 February 2017
We describe a SIMD technique for drawing values from multiple discrete distributions,
such as sampling from the random variables of a mixture model,
that avoids computing a complete table of partial
sums of the relative probabilities. A table of alternate
(``butterfly-patterned'') form is faster to compute, making
better use of coalesced memory accesses; from this table,
complete partial sums are computed on the fly during a binary search.
Measurements using CUDA 7.5 on an NVIDIA Titan Black GPU show that
for double-precision data, this technique makes an entire LDA machine-learning application about 25% faster than doing a straightforward
matrix transposition after using coalesced accesses.
Venue : 2017 ACM Symposium on Principles and Practice of Parallel Programming
External Link: http://dl.acm.org/authorize?N21941