Nonparametric Bayesian Inference at Scale

Project

Nonparametric Bayesian Inference at Scale

Principal Investigator

Alex Smola

Carnegie Mellon University

Oracle Fellowship Recipient

Manzil Zaheer

Oracle Principal Investigator

Guy Steele, Software Architect

Summary

Our goal is to address computational issues that arise when applying Bayesian Nonparametrics
such as topic models, clustering, and hierarchical representations at industrial scale. The problem
is pertinent since model complexity, such as dimensionality, density, the number of clusters, topics,
or the depth of trees increases with data size. Conventional inference algorithms scale superlinearly.
This proposal aims to address problems of large state spaces, large sample size, parallelization,
high dimensionality, and a dense state. The tools are drawn from efficient data structures,
randomization, the fact that natural data follows power law behavior, and insights from systems research.This allows us to reduce the gap between highly scalable modern information retrieval and highly expressive Bayesian Nonparametrics.