Optimizing iterative data analysis programs

Project

Optimizing iterative data analysis programs

Principal Investigator

Volker Markl

Database Systems and Information Management Group, Technische Universität Berlin

Oracle Principal Investigator

Laurent DAYNES, Senior Architect

Summary

Analyzing Big Data requires the execution of advanced algorithms from the fields of machine learning, optimization, text mining, or signal processing. These algorithms go beyond relational operations, often requiring user-defined functions, iterations and stateful operations for their specification. The absence of declarative languages for big data analytics prevents the automatic optimization and parallelization, requiring data scientists to be systems programmers at the same time. This limits the use of advanced data analytics on big data to a small set of experts that have both data analysis and systems programming skills, making the development of data analysis programs expensive, inefficient and time-consuming. The goal of our project is to overcome the need of system programming skills for data analytics, by enabling data scientists to use declarative specifications for describing data analysis programs. We will solve this problem by researching means of automatic optimization and parallelization of data analysis programs that include iterations and user-defined functions next to relational operators.