Scalable Software Code Representation, Search and Classification
Project
Scalable Software Code Representation, Search and Classification
Principal Investigator
Queensland University of Technology
Oracle Fellowship Recipient
Timothy Chappell
Oracle Principal Investigator
Cristina Cifuentes, Vice President, Software Assurance
Summary
The goal of this research is to investigate the effectiveness of document signature approaches for source code classification tasks. We want to determine whether document signature techniques can be used to create a representation of source code that preserves semantic similarities in a way that can be used to quickly retrieve potentially matching source code segments. If these techniques work, the signatures can be used in conjunction with highly efficient approximate retrieval methods in order to classify source code segments based on an existing database of classified source code segments.
We believe that this research will be of value to Oracle due to the high processing requirements of existing source code analysis tools. If signature classification is at least effective enough to reduce the amount of code that needs to be more rigorously analyzed by traditional accurate but computationally more expensive methods, then adopting this approach will produce substantial performance dividends for source code classification.