Machine Learning For Finding Bugs in Source Code: An Initial ReportDecember 2016
Static program analysis is a technique to analyse code without executing it, and can be used to find bugs in source code. Many open source and commercial tools have been
developed in this space over the past 20 years. Of importance for the deployment of static code analysis tools is the precision of the technique and its scalability – numerous false positives and slow runtime both make the tool hard to be used by development, where integration into a nightly build is the standard goal.
In this paper we report our findings on using machine learning techniques to detect defects in C programs. We use three off the shelf machine learning techniques and use a large corpus of programs available for use in both the training and evaluation
of the results. We compare the results produced by the machine learning technique against the Parfait static program analysis tool used internally at Oracle by thousands of developers.
While on the surface the initial results were encouraging, further investigation suggests that the machine learning techniques we used are not suitable replacements for static program analysis tools due to low precision of the results. This could be due to a variety of reasons including not using domain knowledge and lack of suitable data used in the training process.
Time: Dec 23, 07:10 GMT
Authors: Timothy Chappell, Cristina Cifuentes, Padmanabhan Krishnan, Shlomo Geva
Venue: MaLTeSQuE: Workshop on Machine Learning Techniques for Software Quality Evaluation