Nonparametric Bayesian Inference at Scale
Carnegie Mellon University
Oracle Fellowship Recipient
Oracle Principal Investigator
Guy Steele, Software Architect
Our goal is to address computational issues that arise when applying Bayesian Nonparametrics
such as topic models, clustering, and hierarchical representations at industrial scale. The problem
is pertinent since model complexity, such as dimensionality, density, the number of clusters, topics,
or the depth of trees increases with data size. Conventional inference algorithms scale superlinearly.
This proposal aims to address problems of large state spaces, large sample size, parallelization,
high dimensionality, and a dense state. The tools are drawn from efficient data structures,
randomization, the fact that natural data follows power law behavior, and insights from systems research.This allows us to reduce the gap between highly scalable modern information retrieval and highly expressive Bayesian Nonparametrics.