Machine Learning for Finding Bugs: An Initial Report
Timothy Chappell, Cristina Cifuentes, Paddy Krishnan, Shlomo Geva
21 February 2017
Static program analysis is a technique to analyse code without executing it, and can be used to find bugs in source code. Many open source and commercial tools have been developed in this space over the past 20 years. Scalability and precision are of importance for the deployment of static code analysis tools - numerous false positives and slow runtime both make the tool hard to be used by development, where integration into a nightly build is the standard goal. This requires one to identify a suitable abstraction for the static analysis which is typically a manual process and can be expensive. In this paper we report our findings on using machine learning techniques to detect defects in C programs. We use three offthe- shelf machine learning techniques and use a large corpus of programs available for use in both the training and evaluation of the results. We compare the results produced by the machine learning technique against the Parfait static program analysis tool used internally at Oracle by thousands of developers. While on the surface the initial results were encouraging, further investigation suggests that the machine learning techniques we used are not suitable replacements for static program analysis tools due to low precision of the results. This could be due to a variety of reasons including not using domain knowledge such as the semantics of the programming language and lack of suitable data used in the training process.
Venue : IEEE Workshop on Machine Learning Techniques for Software Quality Evaluation (MaLTeSQuE), Klagenfurt, Austria.
External Link: http://ieeexplore.ieee.org/document/7882012/