Bismark Project Phase II – Big Data Feature Engineering


Bismark Project Phase II – Big Data Feature Engineering

Principal Investigator

Chris Re

Stanford University

Oracle Fellowship Recipient

Burak Yuvaz

Oracle Principal Investigator

Eric Sedlar, Vice President and Technical Director Oracle Labs
Vaishnavi Sashikanth


Given the success we have had with this engagement we collaborated to move up the analytics process workflow to address the problem of Big Data feature engineering – a step that precedes the application of learning algorithms like IGD. A typical predictive modeling exercise aims to achieve a parsimonious model that also has the highest possible prediction accuracy. Omission of relevant effects, inclusion of irrelevant effects and time constraints that restrict the amount of exploration of the space of variable combinations often result in sub-optimal models. This problem becomes exacerbated when the number of variables is large (>50) causing the interactive dialog between a predictive engine and a human to simply fall apart. We propose to address this problem as phase II of the Bismark project.