Exploiting Provenance to Enhance Data Management

Project

Exploiting Provenance to Enhance Data Management

Principal Investigator

Boris Glavic

Illinois Institute of Technology

Oracle Fellowship Recipient

Seokki Lee, Xing Niu

Oracle Principal Investigator

Dieter Gawlick
Vasudha Krishnaswamy
Zhen Hua Liu

Summary

Provenance research has operated under the assumption that provenance information will be consumed by humans to support use cases such as auditing and debugging. In this project, we investigate novel applications of provenance. These new applications range from low-level systems support, e.g., improving the performance of query processing and reducing resource usage, to high-level business support, e.g., assessing the value of data based on provenance. To support these new use cases, we will develop low-overhead capturing mechanisms for coarse-granular provenance (e.g., at the level of disk blocks). We then investigate the use of such provenance information for optimizing a wide range of query execution, query optimization, and self-tuning tasks. Our initial focus is on data skipping and caching. Furthermore, we will build a general framework for assessing data value that incorporates provenance and query frequency to determine the importance of data with respect to a workload.