Bismark Project Phase II – Big Data Feature Engineering
Project
Bismark Project Phase II – Big Data Feature Engineering
Principal Investigator
Chris Re
Oracle Fellowship Recipient
Burak Yuvaz
Oracle Principal Investigator
Eric Sedlar, Senior Vice President and Technical Director Oracle Labs
Vaishnavi Sashikanth
Summary
Given the success we have had with this engagement we collaborated to move up the analytics process workflow to address the problem of Big Data feature engineering – a step that precedes the application of learning algorithms like IGD. A typical predictive modeling exercise aims to achieve a parsimonious model that also has the highest possible prediction accuracy. Omission of relevant effects, inclusion of irrelevant effects and time constraints that restrict the amount of exploration of the space of variable combinations often result in sub-optimal models. This problem becomes exacerbated when the number of variables is large (>50) causing the interactive dialog between a predictive engine and a human to simply fall apart. We propose to address this problem as phase II of the Bismark project.