Fast R

Project

Fast R

Principal Investigator

Jan Vitek

Purdue University

Oracle Fellowship Recipient

Lei Zhao, Leo Osvald, Prahlad Joshi, Roman Tsegelskyi

Oracle Principal Investigator

Mario Wolczko, Architect

Summary

Scripting languages, like R, are lightweight, dynamic programming languages that maximize productivity by offering highlevel abstractions. However, they lose their appeal when requirements stabilize and projects enter their deployment phase. The compromises made to reduce development time make it hard to scale to large data sets or perform computationally intensive tasks. This is certainly the case for R. R is a dynamic language for statistical computing that combines lazy functional features and object-oriented programming. Released in 1995 under a GNU GPL license, R rapidly became the lingua franca for statistical data analysis. Today, there are over 4000 packages available from repositories such as CRAN and Bioconductor. The R-forge web site list 1’102 projects. With its 55 user groups, there are about 2,000 package developers, and over 2 million end users. R’s attraction comes from the myriads of open source libraries available on the Internet, and from its dynamic and interactive nature. Unlike previous scientific computing languages which enforced a strong distinction between programming and execution, R allows to manipulate data and experiment with formulas with immediate feedback. Unfortunately, as observed by the original authors of R, the application of cutting-edge statistical methodology is limited by the capabilities of the system in which it is implemented. R simply does not scale to the larger problems of interest in practice. The goals of this project are to improve on the R execution environment and collaborate with Oracle Labs in the development of a sustainable, next generation, R environment that will be accepted by the open source R community and provide increased levels of performance and scalability.