Using Butterfly-Patterned Partial Sums to Draw from Discrete Distributions
February 2017

Slides for a talk to be given at ACM PPoPP on February 8, 2017. This 25-minute talk builds on the paper as accepted by PPoPP (Archivist 2016-057) and a previous version of the slides presented at NVIDIA GTC 2016 (Archivist 2016-0055). *** We describe a SIMD technique for drawing values from multiple discrete distributions, such as sampling from the random variables of a mixture model, that avoids computing a complete table of partial sums of the relative probabilities. A table of alternate ("butterfly-patterned") form is faster to compute, making better use of coalesced memory accesses; from this table, complete partial sums are computed on the fly during a binary search. Measurements using CUDA 7.5 on an NVIDIA Titan Black GPU show that this technique makes an entire machine-learning application that uses a Latent Dirichlet Allocation topic model with 1024 topics is about 13% faster (when using single-precision floating-point data) or about 35% faster (when using double-precision floating-point data) than doing a straightforward matrix transposition after using coalesced accesses.

Authors: Guy Steele

Venue: 2017 ACM PPoPP conference


Hardware and Software, Engineered to Work Together