Scalable Software Code Representation, Search and Classification

Project

Scalable Software Code Representation, Search and Classification

Principal Investigator

Shlomo Geva

Queensland University of Technology

Oracle Fellowship Recipient

Timothy Chappell

Oracle Principal Investigator

Cristina Cifuentes, Vice President, Software Assurance

Summary

The goal of this research is to investigate the effectiveness of document signature approaches for source code classification tasks. We want to determine whether document signature techniques can be used to create a representation of source code that preserves semantic similarities in a way that can be used to quickly retrieve potentially matching source code segments. If these techniques work, the signatures can be used in conjunction with highly efficient approximate retrieval methods in order to classify source code segments based on an existing database of classified source code segments.

We believe that this research will be of value to Oracle due to the high processing requirements of existing source code analysis tools. If signature classification is at least effective enough to reduce the amount of code that needs to be more rigorously analyzed by traditional accurate but computationally more expensive methods, then adopting this approach will produce substantial performance dividends for source code classification.