Query Performance Improvement with Machine Learning on Multicore

Project

Query Performance Improvement with Machine Learning on Multicore

Principal Investigator

Saman Amarasinghe, Victor Pankratius

MIT

Summary

Multicore systems offer new potential for better database query performance with intra-query parallelism, i.e., there are new opportunities for exploiting threading within a query and introducing task parallelism, data parallelism, and pipeline parallelism. Especially Oracle’s hardware with 16 or more cores has the means to make large and complex queries run faster on the same machine, which contrasts past approaches on clusters where parallelism was achieved by dispatching sequential queries onto different nodes. Due to the variety of today’s multicore hardware, however, there is no simple “one size fits all” technique that would work well different platforms. This project tackles this problem using a dynamic auto-tuning approach with machine learning. We do not require a rewrite of the database optimizer. Our approach builds upon a previous collaboration (Felix Schmidt’s Master’s thesis, advised by Victor Pankratius and supervised by Gilda Garreton at Oracle) where multithreading has been adaptively added to boost join-intensive queries. Adding intra-query parallelism can be used in conjunction with other query optimizations. Experiments have shown that it is possible to cluster queries with similar plans based on a metric that takes query properties into account; then, a set of similar transformations introducing parallelism can be applied to each cluster of queries. The results have raised interest in a group inside the Optimization organization at Oracle. For them, the similarity work opens new opportunities on how to monitor performance of large queries in the long run, which can be evaluated by this proposal.